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I. Basis of the report 

1 . This report has been drawn on the basis of (substitute sheets which have been furnished to the receiving Office in 
response to an invitation under Article 14 are referred to in this report as "originally filed" and are not annexed to 
the report since they do not contain amendments (Rules 70. 16 and 70. 1 7).): 
Description, pages: 

1 ,3-35 as originally filed 

2,2a with telefax of 11/1 2/2000 



2. With regard to the language, all the elements marked above were available or furnished to this Authority in the 
language in which the international application was filed, unless otherwise indicated under this item. 

These elements were available or furnished to this Authority in the following language: , which is: 

□ the language of a translation furnished for the purposes of the international search (under Rule 23.1 (b)). 

□ the language of publication of the international application (under Rule 48.3(b)). 

□ the language of a translation furnished for the purposes of international preliminary examination (under Rule 
55.2 and/or 55.3). 

3. With regard to any nucleotide and/or amino acid sequence disclosed in the international application, the 
international preliminary examination was carried out on the basis of the sequence listing: 

□ contained in the international application in written form. 

□ filed together with the international application in computer readable form. 

□ furnished subsequently to this Authority in written form. 

□ furnished subsequently to this Authority in computer readable form. 

□ The statement that the subsequently furnished written sequence listing does not go beyond the disclosure in 
the international application as filed has been furnished. 

□ The statement that the information recorded in computer readable form is identical to the written sequence 
listing has been furnished. 

4. The amendments have resulted in the cancellation of: 

□ the description, pages: 

□ the claims, Nos.: 

□ the drawings, sheets: 



Claims, No.: 



1-14 



with telefax of 



11/12/2000 



Form PCT/IPEA/409 (Boxes l-VIII, Sheet 1) (July 1998) 



INTERNATIONAL PRE^I 
EXAMINATION REPORT 



INARY 



International application No. PCT/CA99/00852 



5 El This report has been established as if (some of) the amendments had not been made, since they have been 
considered to go beyond the disclosure as filed (Rule 70.2(c)): 

(Any replacement sheet containing such amendments must be referred to under item 1 and annexed to th,s 

report.) 

see separate sheet 

6. Additional observations, if necessary: 

„, Non-establishment of opinion with regard to novelty, inventive step and industrial applicability 

1 The questions whether the claimed invention appears to be novel, to involve an inventive step (to be non- 
' obvious) or to be industrially applicable have not been exam.ned .n respect of. 

□ the entire international application. 

H claims Nos. 1 -8 and 1 3-14 partially with respect to industrial applicability. 

because: 

B the said international application, or the said claims Nos. 1-8 and 13-14 partially relate to the following 
subject matter which does not require an international preliminary exammat.on (specfy). 
see separate sheet 

□ the description, claims or drawings (indicate particular elements beloW) or said claims Nos. are so unclear 
that no meaningful opinion could be formed (specify): 



nadequately supported by the description that no meaningful opinion 



□ the claims, or said claims Nos. are so i 
could be formed. 

□ no international search report has been established for the said claims Nos. . 

Instructions: 

□ the written form has not been furnished or does not comply with the standard. 

□ the computer readable form has not been furnished or does not comply with the standard. 



V Reasoned statement under Article 35(2) with regard to novelty, inventive step or industrial applicability; 
citations and explanations supporting such statement 



1. Statement 
Novelty (N) 

Inventive step (IS) 



Yes: Claims 1-14 
No: Claims 

Yes: Claims 1-14 



Form PCT/IPEA/409 (Boxes l-VIII, Sheet 2) (July 1998) 



INTERNATIONAL PREEIMINARY 
EXAMINATION REPORT 



International application No. PCT/CA99/00852 



No: Claims 

Industrial applicability (I A) Yes: Claims 9-12,1 

No: Claims 



2. Citations and explanations 
see separate sheet 



VIII. Certain observations on the international application 

The following observations on the clarity of the claims, description, and drawings or on the question whether the 
claims are fully supported by the description, are made: 
see separate sheet 
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The arguments filed by the applicant with a letter of 8.12.2000 have been taken into 
account for establishing said report. 

Point I: 



The amendments filed with the letter dated 8.12.2000 introduce subject-matter which 
extends beyond the content of the application as filed, contrary to Article 34(2)(b) PCT. 
The amendments concerned are the following: 

- claim 8: non-cancerous cells; no basis for the general term could be found in the 
originally filed description (p 12, I 19-20 only disclose lymphocytes). 

- claim 9: "tissue" 

- claim 12: no basis could be found for the claimed method starting from step b): 
"assaying a function. ..as compared to in the absence thereof". 

- page 2a: "germline" mutation; no basis could be found for mutation in this specific 
cell type. No implicit disclosure is present for the following reasons. A predisposition for 
breast cancer can be created for instance due to a spontaneous mutation of DNA in 
other cells than germline cells. The IPEA furthermore believes that such a 
predisposition can well be diagnosed from somatic cells and does not need to be 
examined in germline cells only. 



Point III: 

Claims 1-8, as well as claim 13 as long as they depend from any one of claims 1-8, 
relate to subject-matter considered by this Authority to be covered by the provisions of 
Rule 67.1(iv) PCT. Consequently, no opinion will be formulated with respect to the 
industrial applicability of the subject-matter of these claims (Article 34(4)(a)(i) PCT). 



Point V: 

Reference is made to the following documents: 

D1 : AMERICAN JOURNAL OF HUMAN GENETICS, 

vol. 61, no. 4 suppl, 28.10.1997 - 1.11.1997, page A64 
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D2: W097/17469 



1 . Articles 33(2) and (3) PCT 

The subject-matter of claims 1-14 is novel (Article 33(2) PCT) in the light of the closest 
prior art D1 , since it is distinguished therefrom in that the androgen receptor is used for 
determining an individual's predisposition of breast cancer or for screening and 
selecting an agent which modulates said predisposition. 

Furthermore, D1 (1 1 1 from the bottom) only suggests a correlation of CAG repeat 
length of AR and breast cancer. 

D2 reveals a method of predicting the risk of prostate cancer morbidity and mortality 
comprising H*t»rminina the le ngth of tm» CAG repeat of the androqen receptor qene 
(abstract, claim 1). 

The teaching of said document can not be combined with D1 for the following reasons: 
a particular marker for a particular cancer (prostate (D1) or breast (D2) cancer) can not 
be directly transposed to a different type of cancer due to the complexity of the genetic 
regulation which operates at different hormonal receptors and the intricate interactions 
of different hormones which can differently affect, in tissue specific fashion, the 
transactivation of genes they regulate. 

Thus, the subject-matter of claims 1-14 is also inventive according to Article 33(3) 
PCT. 



3. Industrial Applicability 

For the assessment of the present claims 1-8, as well as claim 13 as long as they 
depend from any one of claims 1-8, on the question whether they are industrially 
applicable, no unified criteria exist in the PCT contracting States. The patentability can 
also be dependent upon the formulation of the claims. The EPO, for example, does not 
recognize as industrially applicable the subject-matter of claims to the use of a 
compound in medical treatment, but may allow, however, claims to a known compound 
for first use in medical treatment and the use of such a compound for the manufacture 
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of a medicament for a new medical treatment. 



Point VIII: 

1 . The vague and imprecise statement "spirit ... of the subject invention" (page 34, 
line 12) implies that the subject-matter for which protection is sought may be 
different to that defined in the claims, thereby resulting in lack of clarity of the 
claims (Article 6 PCT) when used to interpret them (see the Guidelines, C-lll, 
4.3a). The above statement has not been deleted to remove this defect. 

2. The wording of claim 12 is unclear (Article 6 PCT), since it cannot be derived 
which kind of function ("a function") of the allele is assayed. 

3. The broad terms "variant", "equivalent" and "mutation" used in claims 1 and 12 
are vague and unclear and leave the reader in doubt as to the meaning of the 
technical features to which they refer, thereby rendering the definition of the 
subject-matter of said claim unclear (Article 6 PCT). 

Furthermore, the definitions given in the description (page 21 , lines 1-17, lines 1 
20 and page 22, lines 3-1 0) do not clarify the terms, since it is not said to what 
extent these substances are allowed to differ from the androgen receptor gene. 
The terms "variant" and "mutation" are even so broad that they encompass any 
nucleic acid molecule, since they are not even limited by the function/s of the 
androgen receptor gene. Moreover, the definition of the term "equivalent" is as 
vague since it is silent about how many and which function/s have to remain 
compared to the androgen receptor gene. 
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cancer protection against breast cancer and/or responsiveness to therapy for breast cancer. The method comprises the step of determining 
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thereby determining an individual's predisposition to breast cancer, development of breast cancer, protection against breast cancer and/or 
responsiveness to therapy for breast cancer. 
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protection to breast cancer. This determination can be based on a variety of 
genotyping methods at the DNA, RNA or protein level. 

Another aim of the present invention is to provide a method 
of prognosing and/or forecasting the development of breast cancer in a patient, 
5 which comprises determining a CAG-repeat polymorphism of the AR gene, or 
any polymorphism in linkage disequilibrium therewith, in a biological sample of 
the patient, wherein a determination of the length of the CAG repeat shows a 
significant association with breast cancer. 

In a particular embodiment, the determination of the 

10 polymorphism at the CAG repeat of the AR gene enables to show that the 
shortest alleles or a combination of the shortest alleles are associated with the 
smallest breast cancer risk and the mid to long alleles or a combination of the 
intermediate and longest alleles are associated with the highest breast cancer 
risk (a combination of the longest alleles is associated with the highest risks of 

15 breast cancer) . Of importance, the variations of polymorphisms at the CAG 
repeat locus of AR (or of an equivalent or marker in linkage disequilibrium 
therewith) can account for a significant proportion of all cases of breast cancer. 
Indeed, the number of breast cancer cases attributable to a variation at this AR 
locus is at least three times greater than that attributable to the BRCA1 and 

20 BRCA2 genes. 

The present invention also relates to vectors, including 
expression vectors harboring an AR gene (or fragment or fusion thereof) having 
a genotype in accordance with the present invention (i.e. a predisposing 
genotype, long CAG repeats, or alternatively, a protecting genotype, short CAG 
25 repeats; or other genotypes isolated from patients or genetically engineered), 
cells harboring such vectors, and non-human animals harboring such vectors or 
cells. 

Another aim of the present invention is to provide means of 
identifying young women that will be at risk of developing breast cancer and to 
30 categorize those that are likely to respond significantly to preventive therapy. 
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5 



An aim of the present invention is thus to provide means of identification of 
target sub-groups of women for breast cancer prevention measures/programs. 

Another aim of the present invention is to provide means to 
determine which sub-group of women will most benefit from breast cancer 
5 treatment(s) and eventually predict their response to therapy or choose the 
optimal preventive pharmacotherapy. 

Another aim of the present invention is to identify means of 
predicting and managing interventions for breast cancer as well as identifying 
and/or characterizing biological parameters which could enable the 
10 establishment of population-based breast cancer prevention and intervention 
programs. 

In addition, it is an aim of the present invention to provide a 
method of selecting alleles of the AR gene or in linkage disequilibrium therewith, 
which is suitable for designing an assay to screen compounds which can 
1 5 modulate the activity of an androgen receptor. 

Another aim of the present invention is to provide an assay 
to screen for drugs for the treatment and/or prevention of breast cancer. Having 
identified alleles which predispose to breast cancer (and those which predispose 
to a "resistance" to breast cancer), assays can be set-up to screen agents and 
20 select drugs which could be used in the treatment or prevention of breast 
cancer. Since some alleles of the AR have been shown to affect the 
functionality of the androgen receptor (Tut et al. 1997, J. Clin. Endocrinol. 
89(11):3777-3782), assays could be designed based on chosen genotypes of 
the AR gene. A non-limiting example of a type of assay which could be 
25 designed includes, cis-trans assays similar to those described in USP 
4,981 ,784. For example, a cis-trans assay could be set-up, based on the use 
of a genotype of AR, shown here to predispose to breast cancer (i.e. the long 
CAG alleles in the AR gene) as compared to a genotype of AR, shown here to 
be associated with lower risk of breast cancer, and used to screen compounds. 
30 A non-limiting example of such an assay could be based on 2 cell lines (one 
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expressing a predisposing genotype of AR and one expressing a non- 
predisposing genotype of AR) which could be used in parallel to screen for AR- 
function modulating compounds. Of course, it will be understood that the cell 
line expressing the non-predisposing genotype of AR (the shorter alleles) can 
5 be used as a positive control for the functionality of the androgen receptor. 

It is thus an aim of the present invention to provide the means 
to identify compounds which could positively modulate the function of AR having 
a breast cancer predisposing genotype (such as the long CAG alleles), to the 
level of the protecting genotype thereof (such as the short CAG alleles). 
10 in a particular embodiment, such assays can be designed 

using cells from patients having a known genotype at the loci of the present 
invention, these cells harboring recombinant vectors could enable an 
assessment of the functionality of the AR and dissect the structure-function 
relationship of the androgen receptor and its role in breast cancer. 
1 5 It shall be understood that the polymorphism of the AR and/or 

the determination of allelic variations in the AR gene can be combined to the 
determination of allelic variations in other genes/markers linked to the 
predisposition to breast cancer and/or responsiveness to therapy therefor. This 
combination of genotype analyses could lead to better diagnoses programs 
20 and/or treatment of breast cancer. Non-limiting examples of such markers 
include BRCA1 and BRCA2. 

It shall also be understood that although breast cancer is 
significantly more preponderant in women, it can also be a deadly disease in 
men. Thus, the present invention is meant to also cover men. 
25 In accordance with the present invention, there is therefore 

provided a method of determining an individual's predisposition to breast cancer, 
development of breast cancer and/or responsiveness to therapy for breast 
cancer, which comprises determining a genotype at the CAG-repeat locus of the 
androgen receptor (directly or indirectly by linkage disequilibrium) in a biological 
30 sample of the individual and analyzing allelic variation in the androgen receptor 
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of the individual, thereby determining an individual's predisposition to breast 
cancer, development of breast cancer and/or responsiveness to therapy 
therefor. 

In accordance with the present invention there is provided a 
5 method for determining susceptibility to breast cancer, and/or response to 
therapy therefor. The method comprises the step of determining the androgen 
receptor genotype of the individual, thereby determining an individual's 
susceptibility to breast cancer and/or response to therapy therefor. 

Numerous methods for determining a genotype are known 

1 0 and availble to the skilled artisan. All these genotype determination methods are 
within the scope of the present invention. Non-limiting examples of genotype 
determination include a restriction endonuclease digestion, a hybridization with 
allele specific oligonucleotides, a sequencing of the polymorphism, and an 
amplification of a segment of the androgen receptor (i.e. by PCR). 

15 In accordance with the present invention, there is therefore 

provided a method of determining an individual's predisposition to breast cancer, 
development of breast cancer and/or responsiveness to therapy therefor, which 
comprises determining androgen receptor polymorphism (directly or indirectly 
using a marker in linkage disequilibrium with the CAG repeat polymorphism) in 

20 a biological sample of the individual and analyzing allelic variation in the 
androgen receptor gene of the individual, thereby determining an individual's 
predisposition to breast cancer, development of breast cancer and/or 
responsiveness to therapy therefor. 

In accordance with one embodiment of the invention, there 

25 is provided a specific model for use in prediction of breast cancer susceptibility 
and prognosis. The model comprises an androgen receptor gene 
polymorphisms at the CAG repeat locus, that allows to identify a subset of 
women that are at significantly increased risk of breast cancer as compared to 
those bearing other variant of this gene. 
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In accordance with a preferred embodiment of the present 
invention, a single gene, the androgen receptor gene, has been identified as - 
such a target to assess this predisposition. 

In accordance with the present invention, the androgen 
5 receptor polymorphism, without limitation, is selected from the CAG repeats 
located in the first exon of the AR gene, or any DNA variant or mutation which 
shows some degree of linkage disequelibrium with one of the polymorphisms at 
the CAG-repeat locus of the AR gene. 

In some embodiments, the method of the present invention 

10 includes detecting the androgen receptor polymorphism by analyzing the 
restriction fragment length polymorphisms using an endonuclease digestion. 
The method can further include a step prior to the androgen receptor gene 
digestion, wherein at least a fragment of the androgen receptor is amplified, for 
example, by polymerase chain reaction. 

15 In accordance with a preferred embodiment of the present 

invention, a pair of primers is designed to specifically amplify a segment of the 
androgen receptor In an especially preferred embodiment, the region of the AR 
gene which is amplified is in exon 1 . This pair of primers is preferably derived 
from a nucleic acid sequence of the androgen receptor gene or flanking portion 

20 thereof, to amplify a segment of the androgen receptor gene, as commonly 
known. Of course, other primer pairs can be designed, based on the known 
sequence of the AR gene. Method to design primer pairs form known 
sequences are commonly known in the art. 

In accordance with a preferred embodiment of the present 

25 invention, primers used for amplifying the segment of the androgen receptor are 
defined as: 

5'-TCCAGAATCTGTTCCAGAGCGTGC-3 (SEQ ID NO:1); and 
5'-GCTGTGAAGGTTGCTGTTCCTCAT-3' (SEQ ID NO:2). 
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The polymorphism of the androgen receptor gene can be 
detected using at least one oligonucleotide specific to the normal or variant - 
androgen receptor gene allele. 

The present invention also provides a kit for determining 
5 predisposition to low, intermediate or high risk of breast cancer of a patient, 
which includes at least a probe specific for the androgen receptor; a 
polymorphism selected from the group consisting of a CAG repeat and other 
polymorphisms in linkage disequilibrium with the CAG repeat polymorphism. 

In one embodiment, the present invention provides a specific 

10 detection of the CAG repeat polymorphism of the AR gene using a nucleic acid 
for the specific detection of this AR polymorphism in a sample comprising the 
above-described CAG-repeat-containing nucleic acid sequence (i.e. DNA, RNA, 
cDNA) and at least a nucleic acid sequence which binds under stringent 
conditions to the CAG-repeat-containing nucleic acid sequence. 

15 In one prefered embodiment, the present invention relates to 

nucleic acid probes which are complementary to a CAG-repeat-containing 
nucleic acid sequence, consisting of at least 10 consecutive nucleotides 
(preferably, 15, 20, 25, or 30) and which specifically hybridize to the AR nucleic 
acid sequence comprising the CAG repeat polymorphism under high stringency 

20 condition. 

In one embodiment of the above described method, a nucleic 
acid probe is immobilized on a solid support. Non-limiting examples of solid 
supports include plastics (i.e. polycarbonate), acrylic resins (i.e. polyacrylamide 
and latex beads); and carbohydrates (i.e. agarose and sepharose). Techniques 

25 for coupling nucleic acid probes to solid supports are well known in the art. 

Similarly to the probes of the present invention, the antibodies 
of the present invention can be immobilized on a solid support. As known in the 
art, similar supports as those used for probe immobilization can be used for 
antibody immobilization on a solid support. Also well known in the art are the 

30 techniques for coupling antibodies to such solid supports. The immobilized 
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antibodies of the present invention can be used for in vitro, in vivo, and in situ 
assays as well as in immunochromatography according to known methods. 

Non-limiting examples of test samples suitable for carrying 
the methods of the present invention include, cells or nucleic acid extracts of 
5 cells, or biological fluids. Of course, the type of test sample used can vary 
according to the assay format, the method of detection, and the particular needs 
of the clinical practioner which will readily adapt the methods of preparation of 
the sample and the method of detection so that they are compatible, in 
accordance with the knowledge in the art. 

10 In accordance with one embodiment of the present invention, 

the allelic variation in the androgen receptor gene is analyzed indirectly using a 
nucleic acid variant, or equivalent in linkage disequilibrium with a CAG repeat. 
The allelic variation in the androgen receptor gene can also be analyzed directly 
by determining the number of CAG repeat within the androgen receptor gene. 

15 In accordance with the present invention, the polymorphism 

of the androgen receptor (AR) gene can be used as a marker for breast cancer 
susceptibility. The polymorphism in linkage disequilibrium with the markers used 
can also be used as a test for breast cancer susceptibility, or for responsiveness 
to treatment for breast cancer, for breast cancer prognosis or severity, or as a 

20 means to classify patients in clinical trials for breast cancer (screening, 
diagnosis, prognosis or treatment). 

In order to provide a clear and consistent understanding 
of terms used in the present description, a number of definitions are provided 
hereinbelow. 

25 As used herein the term "RFLP" refers to restriction 

fragment length polymorphism. 

The terms "polymorphism", "DNA polymorphism" and the 
like, refer to any sequence in the human genome which exists in more than one 
version or variant in the population. 
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The term "linkage disequilibrium" refers to any degree of 
non-random genetic association between one or more allele(s) of two different 
polymorphic DNA sequences, that is due to the physical proximity of the two loci. 
Linkage disequilibrium is present when two DNA segments that are very close 
5 to each other on a given chromosome will tend to remain unseparated for 
several generations with the consequence that alleles of a DNA polymorphism 
(or marker) in one segment will show a non-random association with the alleles 
of a different DNA polymorphism (or marker) located in the other DNA segment 
nearby. Hence, testing of one of a marker in linkage desiquilibrium with the 

10 polymorphisms of the present invention at the AR gene (indirect testing), will 
give almost the same information as testing for the CAG repeat polymorphism 
of the AR gene directly. This situation is encountered throughout all the human 
genome when two DNA polymorphisms that are very close to each other are 
studied. Such a linkage disequilibrium has been reported with several 

15 polymorphisms in several genes (i.e. the vitamin D receptor gene [Morrisson et 
aL, 1994, Nature 367:284-287, and USP 5,593,033]). Various degrees of 
linkage disequilibrium can be encountered between two genetic markers so that 
some are more closely associated than others. 

The terms "androgen receptor polymorphism" or "genetic 

20 marker" are intended to include, without limitation, the CAG-repeat 
polymorphism in exon 1 , and any other allelic variant of the androgen receptor 
gene that show some degree of linkage disequilibrium in any population sub- 
group with at least one of the above-mentioned androgen receptor 
polymorphisms. 

25 The androgen receptor gene polymorphism sites in 

accordance with the present invention can be located within the androgen 
receptor gene, or on each side thereof, provided that is on the same 
chromosome and in linkage disequilibrium with the AR polymorphism of the 
present invention. Distances between markers in linkage disequilibrium can 

30 vary widely (below 50 kb to more than 1 mega base) depending on the genetic 
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structure of the population and is ascertainable by a statistically significant 
association between the markers. 

It shall be recognized by the person skilled in the art to 
which the present invention pertains, that since some of the polymorphisms 
5 herein identified in the AR gene can be within the coding region of the gene and 
therefore expressed, that the present invention should not be limited to the 
identification of polymorphisms at the DNA level (whether on genomic DNA, 
amplified DNA, cDNA or the like). Indeed, the herein-identified polymorphisms 
could be detected at the mRNA or protein level. Such detections of 
10 polymorphism identification on mRNA or protein are known in the art. Non- 
limiting examples include detection based on oligos designed to hybridize to 
mRNA or ligands such as antibodies which are specific to the encoded 
polymorphism (i.e. specific to the protein fragment encoded by the CAG repeat 
for example). 

15 Since some of the polymorphisms of the present invention 

are expressed, one of the advantages of the present invention is to enable a 
determination of the polymorphisms in the AR gene, in easily obtainable cells 
which express these genes. A non-limiting example thereof is lymphocytes, 
thereby enabling a genotyping from a simple blood sample. 

20 Nucleotide sequences are presented herein by single 

strand, in the 5' to 3* direction, from left to right, using the one letter nucleotide 
symbols as commonly used in the art and in accordance with the 
recommendations of the IUPAC-IUB Biochemical Nomenclature Commission. 

Unless defined otherwise, the scientific and technological 

25 terms and nomenclature used herein have the same meaning as commonly 
understood by a person of ordinary skill to which this invention pertains. 
Generally, the procedures for cell cultures, infection, molecular biology methods 
and the like are common methods used in the art. Such standard techniques 
can be found in reference manuals such as for example Sambrook et al. (1989, 
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Molecular Cloning -A Laboratory Manual, Cold Spring Harbor Laboratories) and 
Ausubel et al. (1994, Current Protocols in Molecular Biology, Wiley, New York). — 

The present description refers to a number of routinely 
used recombinant DNA (rDNA) technology terms. Nevertheless, definitions of 
5 selected examples of such rDNA terms are provided for clarity and consistency. 

As used herein, "nucleic acid molecule", refers to a 
polymer of nucleotides. Non-limiting examples thereof include DNA (i.e. genomic 
DNA, cDNA) and RNA molecules (i.e. mRNA). The nucleic acid molecule can 
be obtained by cloning techniques or synthesized. DNA can be double-stranded 
10 or single-stranded (coding strand or non-coding strand [antisense]). 

The term "recombinant DNA" as known in the art refers 
to a DNA molecule resulting from the joining of DNA segments. This is often 
referred to as genetic engineering. 

The term "DNA segment", is used herein, to refer to a 
15 DNA molecule comprising a linear stretch or sequence of nucleotides. This 
sequence when read in accordance with the genetic code, can encode a linear 
stretch or sequence of amino acids which can be referred to as a polypeptide, 
protein, protein fragment and the like. 

The terminology "amplification pair" refers herein to a pair 
20 of oligonucleotides (oligos) of the present invention, which are selected to be 
used together in amplifying a selected nucleic acid sequence by one of a 
number of types of amplification processes, preferably a polymerase chain 
reaction. Other types of amplification processes include ligase chain reaction, 
strand displacement amplification, or nucleic acid sequence-based amplification, 
25 as explained in greater detail below. As commonly known in the art, the oligos 
are designed to bind to a complementary sequence under selected conditions. 

The nucleic acid (i.e. DNA or RNA) for practicing the 
present invention may be obtained according to well known methods. 

Oligonucleotide probes or primers of the present invention 
30 may be of any suitable length, depending on the particular assay format and the 
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particular needs and targeted genomes employed. In general, the 
oligonucleotide probes or primers are at least 12 nucleotides in length, 
preferably between 15 and 24 molecules, and they may be adapted to be 
especially suited to a chosen nucleic acid amplification system. As commonly 
5 known in the art, the oligonucleotide probes and primers can be designed by 
taking into consideration the melting point of hydrizidation thereof with its 
targeted sequence (see below and in Sambrook et al., 1989, Molecular Cloning 
-A Laboratory Manual, 2nd Edition, CSH Laboratories; Ausubel et al., 1989, in 
Current Protocols in Molecular Biology, John Wiley & Sons Inc., N.Y.). 
10 The term "oligonucleotide" or "DNA" molecule or 

sequence refers to a molecule comprised of the deoxyribonucleotides adenine 
(A), guanine (G), thymine (T) and/or cytosine (C), in a double-stranded form, 
and comprises or includes a "regulatory element" according to the present 
invention, as the term is defined herein. The term "oligonucleotide" or "DNA" 
15 can be found in linear DNA molecules or fragments, viruses, plasmids, vectors, 
chromosomes or synthetically derived DNA. As used herein, particular 
double-stranded DNA sequences may be described according to the normal 
convention of giving only the sequence in the 5' to 3' direction. 

"Nucleic acid hybridization" refers generally to the 
20 hybridization of two single-stranded nucleic acid molecules having 
complementary base sequences, which under appropriate conditions will form 
a thermodynamically favored double-stranded structure. Examples of 
hybridization conditions can be found in the two laboratory manuals referred 
above (Sambrook et al., 1989, supra and Ausubel et al., 1989, supra) and are 
25 commonly known in the art. In the case of a hybridization to a nitrocellulose 
filter, as for example in the well known Southern blotting procedure, a 
nitrocellulose filter can be incubated overnight at 65°C with a labeled probe in 
a solution containing 50% formamide, high salt (5 x SSC or 5 x SSPE), 5 x 
Denhardt's solution, 1% SDS, and 100 \}gim\ denatured carrier DNA (i.e. salmon 
30 sperm DNA). The non-specifically binding probe can then be washed off the 
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filter by several washes in 0.2 x SSC/0.1% SDS at a temperature which is 
selected in view of the desired stringency: room temperature (low stringency), 
42°C (moderate stringency) or 65°C (high stringency). The selected temperature 
is based on the melting temperature (Tm) of the DNA hybrid. Of course, 
5 RNA-DNA hybrids can also be formed and detected. In such cases, the 
conditions of hybridization and washing can be adapted according to well known 
methods by the person of ordinary skill. Stringent conditions will be preferably 
used (Sambrook et al.,1989, supra). 

Probes of the invention can be utilized with naturally 

10 occurring sugar-phosphate backbones as well as modified backbones including 
phosphorothioates, dithionates, alkyl phosphonates and a-nucleotides and the 
like. Modified sugar-phosphate backbones are generally taught by Miller, 1988, 
Ann. Reports Med. Chem. 23:295 and Moran et aL, 1987, Nucleic acid molecule. 
Acids Res., 14:5019. Probes of the invention can be constructed of either 

15 ribonucleic acid (RNA) or deoxyribonucleic acid (DNA) f and preferably of DNA. 

The types of detection methods in which probes can be 
used include Southern blots (DNA detection), dot or slot blots (DNA, RNA), and 
Northern blots (RNA detection). Although less preferred, labeled proteins could 
also be used to detect a particular nucleic acid sequence to which it binds. More 

20 recently, PNAs have been described (Nielsen et al. 1999, Current Opin. 
BiotechnoL 10:71-75). PNAs could also be used to detect the polymorphisms 
of the present invention. Other detection methods include kits containing probes 
on a dipstick setup and the like. 

Although the present invention is not specifically 

25 dependent on the use of a label for the detection of a particular nucleic acid 
sequence, such a label might be beneficial, by increasing the sensitivity of the 
detection. Furthermore, it enables automation. Probes can be labeled according 
to numerous well known methods (Sambrook et al., 1989, supra). Non-limiting 
examples of labels include 3 H, 14 C, 32 P, and 35 S. Non-limiting examples of 

30 detectable markers include ligands, fluorophores, chemiluminescent agents, 
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enzymes, and antibodies. Other detectable markers for use with probes, which 
can enable an increase in sensitivity of the method of the invention, include — 
biotin and radionucleotides. It will become evident to the person of ordinary skill 
that the choice of a particular label dictates the manner in which it is bound to 
5 the probe. 

As commonly known, radioactive nucleotides can be 
incorporated into probes of the invention by several methods. Non-limiting 
examples thereof include kinasing the 5' ends of the probes using gamma 32 P 
ATP and polynucleotide kinase, using the Klenow fragment of Pol I of E. coli in 
10 the presence of radioactive dNTP (i.e. uniformly labeled DNA probe using 
random oligonucleotide primers in low-melt gels), using the SP6/T7 system to 
transcribe a DNA segment in the presence of one or more radioactive NTP, and 
the like. 

As used herein, "oligonucleotides" or "oligos" define a 
15 molecule having two or more nucleotides (ribo or deoxyribonucleotides). The 

size of the oligo will be dictated by the particular situation and ultimately on the 

particular use thereof and adapted accordingly by the person of ordinary skill. 

An oligonucleotide can be synthesized chemically or derived by cloning 

according to well known methods. 
20 As used herein, a "primer" defines an oligonucleotide 

which is capable of annealing to a target sequence, thereby creating a double 

stranded region which can serve as an initiation point for DNA synthesis under 

suitable conditions. 

Amplification of a selected, or target, nucleic acid 
25 sequence may be carried out by a number of suitable methods. See generally 
Kwoh et al., 1990, Am. Biotechnol. Lab. 8:14-25. Numerous amplification 
techniques have been described and can be readily adapted to suit particular 
needs of a person of ordinary skill. Non-limiting examples of amplification 
techniques include polymerase chain reaction (PCR), ligase chain reaction 
30 (LCR), strand displacement amplification (SDA), transcription-based 
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amplification, the Qp replicase system and NASBA (Kwoh et aL, 1989, Proc. 
Natl. Acad. Sci. USA 86, 1173-1177; Lizardi et al., 1988, BioTechnology 6:1 197- - 
1202; Malek et al., 1994, Methods Mol. Biol., 28:253-260; and Sambrook et al., 
1989, supra). Preferably, amplification will be carried out using PCR. 
5 Polymerase chain reaction (PCR) is carried out in 

accordance with known techniques. See, e.g., U.S. Pat. Nos. 4,683,195; 
4,683,202; 4,800,159; and 4,965,188 (the disclosures of all three U.S. Patent 
are incorporated herein by reference). In general, PCR involves, a treatment of 
a nucleic acid sample (e.g., in the presence of a heat stable DNA polymerase) 

10 under hybridizing conditions, with one oligonucleotide primer for each strand of 
the specific sequence to be detected. An extension product of each primer which 
is synthesized is complementary to each of the two nucleic acid strands, with the 
primers sufficiently complementary to each strand of the specific sequence to 
hybridize therewith. The extension product synthesized from each primer can 

15 also serve as a template for further synthesis of extension products using the 
same primers. Following a sufficient number of rounds of synthesis of extension 
products, the sample is analysed to assess whether the sequence or sequences 
to be detected are present. Detection of the amplified sequence may be carried 
out by visualization following EtBr staining of the DNA following gel 

20 eiectrophores, or using a detectable label in accordance with known techniques, 
and the like. For a review on PCR techniques (see PCR Protocols, A Guide to 
Methods and Amplifications, Michael et al. Eds, Acad. Press, 1990). 

Ligase chain reaction (LCR) is carried out in accordance 
with known techniques (Weiss, 1991, Science 254 :1292V Adaptation of the 

25 protocol to meet the desired needs can be carried out by a person of ordinary 
skill. Strand displacement amplification (SDA) is also carried out in accordance 
with known techniques or adaptations thereof to meet the particular needs 
(Walker et aL, 1992, Proc. Natl. Acad. Sci. USA 89:392-396; and ibid., 1992, 
Nucleic Acids Res. 20:1691-1696). 
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As used herein, the term "gene" is well known in the art 
and relates to a nucleic acid sequence defining a single protein or polypeptide. 
A "structural gene" defines a DNA sequence which is transcribed into RNA and 
translated into a protein having a specific amino acid sequence thereby giving 
5 rise the a specific polypeptide or protein. 

A "heterologous" (i.e. a heterologous gene) region of a 
DNA molecule is a subsegment segment of DNA within a larger segment that 
is not found in association therewith in nature. The term "heterologous" can be 
similarly used to define two polypeptidic segments not joined together in nature. 
10 Non-limiting examples of heterologous genes include reporter genes such as 
luciferase, chloramphenicol acetyl transferase, p-galactosidase, and the like 
which can be juxtaposed or joined to heterologous control regions or to 
heterologous polypeptides. 

The term "vector" is commonly known in the art and 
15 defines a plasmid DNA, phage DNA, viral DNA and the like, which can serve as 
a DNA vehicle into which DNA of the present invention can be cloned. 
Numerous types of vectors exist and are well known in the art. 

The term "expression" defines the process by which a 
gene is transcribed into mRNA (transcription), the mRNA is then being 
20 translated (translation) into one polypeptide (or protein) or more. 

The terminology "expression vector" defines a vector or 
vehicle as described above but designed to enable the expression of an inserted 
sequence following transformation into a host. The cloned gene (inserted 
sequence) is usually placed under the control of control element sequences 
25 such as promoter sequences. The placing of a cloned gene under such control 
sequences is often refered to as being operably linked to control elements or 
sequences. 

Operably linked sequences may also include two 
segments that are transcribed onto the same RNA transcript. Thus, two 
30 sequences, such as a promoter and a "reporter sequence" are operably linked 
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if transcription commencing in the promoter will produce an RNA transcript of the 
reporter sequence. In order to be "operably linked" it is not necessary that two 
sequences be immediately adjacent to one another. 

Expression control sequences will vary depending on 
5 whether the vector is designed to express the operably linked gene in a 
prokaryotic or eukaryotic host or both (shuttle vectors) and can additionally 
contain transcriptional elements such as enhancer elements, termination 
sequences, tissue-specificity elements, and/or translational initiation and 
termination sites. 

10 Prokaryotic expressions are useful for the preparation of 

large quantities of the protein encoded by the DNA sequence of interest. This 
protein can be purified according to standard protocols that take advantage of 
the intrinsic properties thereof, such as size and charge (i.e. SDS gel 
electrophoresis, gel filtration, centrifugation, ion exchange chromatography...). 

15 In addition, the protein of interest can be purified via affinity chromatography 
using polyclonal or monoclonal antibodies. The purified protein can be used for 
therapeutic applications. 

The DNA construct can be a vector comprising a promoter 
that is operably linked to an oligonucleotide sequence of the present invention, 

20 which is in turn, operably linked to a heterologous gene, such as the gene for the 
luciferase reporter molecule. "Promoter" refers to a DNA regulatory region 
capable of binding directly or indirectly to RNA polymerase in a cell and initiating 
transcription of a downstream (3' direction) coding sequence. For purposes of 
the present invention, the promoter is bound at its 3' terminus by the 

25 transcription initiation site and extends upstream (5* direction) to include the 
minimum number of bases or elements necessary to initiate transcription at 
levels detectable above background. Within the promoter will be found a 
transcription initiation site (conveniently defined by mapping with S1 nuclease), 
as well as protein binding domains (consensus sequences) responsible for the 

30 binding of RNA polymerase. Eukaryotic promoters will often, but not always, 
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contain "TATA" boses and "CCAT" boxes. Prokaryotic promoters contain 
Shine-Dalgarno sequences in addition to the -10 and -35 consensus sequences. 

In accordance with one embodiment of the present 
invention, an expression vector can be constructed to assess the functionality 
5 of specific alleles of the AR gene and of the interaction of such alleles. Non- 
limiting examples of such expression vectors include a vector comprising the 
androgen responsive element (the cis sequences [i.e. DNA sequence to which 
a factor binds] enabling androgen-dependent modulating effects of promoter 
activity are known in the art) operably linked to a chosen promoter and 

10 modulating the activity thereof, the promoter driving the expression of a reporter 
gene. When such a vector is tranfected in a cell expressing AR, the modulating 
effect of the promoter activity can be assessed by determining the level of 
expression of the reporter gene. In one embodiment, the vector is transfected 
into a cell of a patient having the genotype of AR shown herein to be associated 

15 with a low risk of breast cancer, or in a cell from a patient having the genotype 
of AR shown herein to be associated with a moderate or high risk of breast 
cancer. These cells can serve to screen for compounds that modulate the 
promoter activity, in order to identify compounds that could be used to treat 
especially, patients predicted to be at moderate or high risk of breast cancer. 

20 Of course, it will be understood that the AR gene expressed by these cells can 
be modified at will (i.e. by in vitro mutagenesis or the like). Similarly, numerous 
combinations of genotypes can be tested in such assays to dissect the 
functional relationship between the AR genotype and its function in androgen- 
dependent function and/or its function in breast cancer. It will also be clear to 

25 the skilled artisan, that such indicator cells expressing AR, could also be 
engineered by choosing a cell line and transfecting thereinto, chosen genotypes 
of AR and one expression vector as described above. Non-human transgenic 
animals expressing chosen alleles of AR could also be prepared and used to 
screen compounds that affect androgen receptor function and possibly 

30 overcome a predisposition to breast cancer. 
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As used herein, the designation "functional derivative" 
denotes, in the context of a functional derivative of a sequence whether an 
nucleic acid or amino acid sequence, a molecule that retains a biological activity 
(either function or structural) that is substantially similar to that of the original 
5 sequence. This functional derivative or equivalent may be a natural derivative 
or may be prepared synthetically. Such derivatives include amino acid 
sequences having substitutions, deletions, or additions of one or more amino 
acids, provided that the biological activity of the protein is conserved. The same 
applies to derivatives of nucleic acid sequences which can have substitutions, 

10 deletions, or additions of one or more nucleotides, provided that the biological 
activity of the sequence is generally maintained. When relating to a protein 
sequence, the substituting amino acid as chemico-physical properties which are 
similar to that of the substituted amino acid. The similar chemico-physical 
properties include, similarities in charge, bulkiness, hydrophobicity, 

15 hydrophylicity and the like. The term "functional derivatives" is intended to 
include "fragments", "segments", "variants", "analogs" or "chemical derivatives" 
of the subject matter of the present invention. 

Thus, the term "variant" refers herein to a protein or 
nucleic acid molecule which is substantially similar in structure and biological 

20 activity to the protein or nucleic acid of the present invention. 

The functional derivatives of the present invention can be 
synthesized chemically or produced through recombinant DNA technology, all 
these methods are well known in the art. 

As used herein, "chemical derivatives" is meant to cover 

25 additional chemical moieties not normally part of the subject matter of the 
invention. Such moieties could affect the physico-chemical characteristic of the 
derivative (i.e. solubility, absorption, half life and the like, decrease of toxicity). 
Such moieties are examplified in Remington's Pharmaceutical Sciences (1980). 
Methods of coupling these chemical-physical moieties to a polypeptide are well 

30 known in the art. 
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The term "allele" defines an alternative form of a gene 
which occupies a given locus on a chromosome. 

As commonly known, a "mutation" is a detectable change 
in the genetic material which can be transmitted to a daughter cell. As well 
5 known, a mutation can be, for example, a detectable change in one or more 
deoxyribonucleotide. For example, nucleotides can be added, deleted, 
substituted for, inverted, or transposed to a new position. Spontaneous 
mutations and experimentally induced mutations exist. The result of a mutations 
of nucleic acid molecule is a mutant nucleic acid molecule. A mutant polypeptide 

10 can be encoded from this mutant nucleic acid molecule. 

As used herein, the term "purified" refers to a molecule 
having been separated from a cellular component. Thus, for example, a "purified 
protein" has been purified to a level not found in nature. A "substantially pure" 
molecule is a molecule that is lacking in all other cellular components. 

15 As used herein, the terms "molecule", "compound", or 

"agent" are used interchangeably and broadly to refer to natural, synthetic or 
semi-synthetic molecules or compounds. The term "molecule" therefore 
denotes for example chemicals, macromolecules, cell or tissue extracts (from 
plants or animals) and the like. Non limiting examples of molecules include 

20 nucleic acid molecules, peptides, ligands, including antibodies, carbohydrates 
and pharmaceutical agents. The agents can be selected and screened by a 
variety of means including random screening, rational selection and by rational 
design using for example protein or ligand modelling methods such as computer 
modelling. The terms "rationally selected" or "rationally designed" are meant to 

25 define compounds which have been chosen based on the configuration of the 
interaction domains of the present invention. As will be understood by the 
person of ordinary skill, macromolecules having non-naturally occurring 
modifications are also within the scope of the term "molecule". For example, 
peptidomimetics, well known in the pharmaceutical industry and generally 

30 referred to as peptide analogs can be generated by modelling as mentioned 
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above. Similarly, in a preferred embodiment, the polypeptides of the present 
invention are modified to enhance their stability. It should be understood that ~ 
in most cases this modification should not alter the biological activity of the 
protein. The molecules identified in accordance with the teachings of the 
5 present invention have a therapeutic value in diseases or conditions in which a 
apparently lower activity and/or level of the AR is linked to a genotype of AR 
identified in accordance with the present invention. Alternatively, the molecules 
identified in accordance with the teachings of the present invention find utility in 
the development of compounds which can modulate the activity and/or level of 
10 the androgen receptor in an animal and/or overcome a predisposition to breast 
cancer. 

As used herein, agonists and antagonists also include 
potentiators of known compounds with such agonist or antagonist properties. 
In one embodiment, modulators of the level or the activity of the AR can be 

15 identified and selected by contacting the indicator cell with a compound or 
mixture or library of molecules for a fixed period of time. In certain 
embodiments, the "breast cancer-low risk-associated alleles"of the AR gene can 
be used as positive controls. 

An indicator cell in accordance with the present invention 

20 can be used to identify antagonists. For example, the test molecule or 
molecules are incubated with the host cell in conjunction with one or more 
agonists held at a fixed concentration. An indication and relative strength of the 
antagonistic properties of the molecule(s) can be provided by comparing the 
level of gene expression in the indicator cell in the presence of the agonist, in 

25 the absence of test molecules vs in the presence thereof. Of course, the 
antagonistic effect of a molecule can also be determined in the absence of 
agonist, simply by comparing the level of expression of the reporter gene 
product in the presence and absence of the test molecule(s). 

It shall be understood that the "in vivo" experimental 

30 model can also be used to carry out an "in vitro" assay. For example, cellular 
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extracts from the indicator cells can be prepared and used in an "in vitro" test. 
A non-limiting example thereof include binding assays. 

As used herein the recitation "indicator cells" refers to cells 
that express a given genotype of AR according to the present invention. As 
5 alluded to above, such indicator cells can be used in the screening assays of the 
present invention. In certain embodiments, the indicator cells have been 
engineered so as to express a chosen derivative, fragment, homolog, or mutant 
of a genotype of the present invention. The cells can be yeast cells or higher 
eukaryotic cells such as mammalian cells. In one particular embodiment, the 

1 0 indicator cell would be a yeast cell harboring vectors enabling the use of the two 
hybrid system technology, as well known in the art (Ausubel et al. ( 1994, supra) 
and can be used to test a compound or a library thereof. In another 
embodiment, the cis-trans assay as described in USP 4,981,784, can be 
adapted and used in accordance with the present invention. Such an indicator 

15 cell could be used to rapidly screen at high-throughput a vast array of test 
molecules. In a particular embodiment, the reporter gene is luciferase or p-Gal. 

in some embodiments, it might be beneficial to express 
a fusion protein. The design of constructs therefor and the expression and 
production of fusion proteins and are well known in the art (Sambrook et al., 

20 1989, supra] and Ausubel et al., 1994, supra). 

Non limiting examples of such fusion proteins include a 
hemaglutinin fusions and Gluthione-S-transferase (GST) fusions and Maltose 
binding protein (MBP) fusions. In certain embodiments, it might be beneficial to 
introduce a protease cleavage site between the two polypeptide sequences 

25 which have been fused. Such protease cleavage sites between two 
heterologously fused polypeptides are well known in the art. 

In certain embodiments, it might also be beneficial to fuse 
the protein of the present invention to signal peptide sequences enabling a 
secretion of the fusion protein from the host cell. Signal peptides from diverse 

30 organisms are well known in the art. Bacterial OmpA and yeast Suc2 are two 
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non limiting examples of proteins containing signal sequences. In certain 
embodiments, it might also be beneficial to introduce a linker (commonly known) 
between the interaction domain and the heterologous polypeptide portion. Such 
fusion protein find utility in the assays of the present invention as well as for 
5 purification purposes, detection purposes and the like. 

For certainty, the sequences and polypeptides useful to 
practice the invention include without being limited thereto mutants, homologs, 
subtypes, alleles and the like. It shall be understood that generally, the 
sequences of the present invention should encode a functional (albeit defective) 

1 0 AR. It will be clear to the person of ordinary skill that whether the AR sequence 
of the present invention, variant, derivative, or fragment thereof retains its 
function, can be determined by using the teachings and assays of the present 
invention and the general teachings of the art. 

It should be understood that the AR protein of the present 

15 invention can be modified, for example by in vitro mutagenesis, to dissect the 
structure-function relationship thereof and permit a better design and 
identification of modulating compounds. However, some derivative or analogs 
having lost their biological function may still find utility, for example for raising 
antibodies. These antibodies could be used for detection or purification 

20 purposes. In addition, these antibodies could also act as competitive or 
non-competitive inhibitor and be found to be modulators of the activity of the AR 
protein of the present invention. 

A host cell or indicator cell has been "transfected" by 
exogenous or heterologous DNA (e.g. a DNA construct) when such DNA has 

25 been introduced inside the cell. The transfecting DNA may or may not be 
integrated (covalently linked) into chromosomal DNA making up the genome of 
the cell. In prokaryotes, yeast, and mammalian cells for example, the 
transfecting DNA may be maintained on a episomal element such as a plasmid. 
With respect to eukaryotic cells, a stably transfected cell is one in which the 

30 transfecting DNA has become integrated into a chromosome so that it is 
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inherited by daughter cells through chromosome replication. This stability is 
demonstrated by the ability of the eukaryotic cell to establish cell lines or clones 
comprised of a population of daughter cells containing the transfecting DNA. 
Transfection methods are well known in the art (Sambrook et al., 1989, supra\ 
5 Ausubel et al., 1994 supra). The use of a mammalian cell as indicator can 
provide the advantage of furnishing an intermediate factor, which permits for 
example the interaction of two polypeptides which are tested, that might not be 
present in lower eukaryotes or prokaryotes. It will be understood that extracts 
from mammalian cells for example could be used in certain embodiments, to 
1 0 compensate for the lack of certain factors. 

in general, techniques for preparing antibodies (including 
monoclonal antibodies and hybridomas) and for detecting antigens using 
antibodies are well known in the art (Campbell, 1984, In "Monoclonal Antibody 
Technology: Laboratory Techniques in Biochemistry and Molecular Biology", 
15 Elsevier Science Publisher, Amsterdam, The Netherlands) and in Harlow et al., 
1988 (in: Antibody-A Laboratory Manual, CSH Laboratories). The present 
invention also provides polyclonal, monoclonal antibodies, or humanized 
versions thereof, chimeric antibodies and the like which inhibit or neutralize their 
respective interaction domains and/or are specific thereto. 
20 From the specification and appended claims, the term 

therapeutic agent should be taken in a broad sense so as to also include a 
combination of at least two such therapeutic agents. Further, the DNA 
segments or proteins according to the present invention could be introduced into 
individuals in a number of ways. For example, cells can be isolated from the 
25 afflicted individual, transformed with a DNA construct according to the invention 
and reintroduced to the afflicted individual in a number of ways. Alternatively, the 
DNA construct can be administered directly to the afflicted individual. The DNA 
construct can also be delivered through a vehicle such as a liposome, which can 
be designed to be targeted to a specific cell type, and engineered to be 
30 administered through different routes. For example, an androgen receptor 
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gene having the genotype associated with iow risk of breast cancer coutd be 
introduced in cells or in an individual displaying the AR polymorphism associated 
with high risk of breast cancer. 

For administration to humans, the prescribing medical 
5 professional will ultimately determine the appropriate form and dosage for a 
given patient, and this can be expected to vary according to the chosen 
therapeutic regimen (i.e. DNA construct, protein, cells), the response and 
condition of the patient as well as the severity of the disease. 

Composition within the scope of the present invention 
10 should contain the active agent (i.e. molecule, hormone) in an amount effective 
to achieve the desired therapeutic effect while avoiding adverse side effects. 
Typically, the nucleic acids in accordance with the present invention can be 
administered to mammals (i.e. humans) in doses ranging from 0.005 to 1 mg per 
kg of body weight per day of the mammal which is treated. Pharmaceutical^ 
1 5 acceptable preparations and salts of the active agent are within the scope of the 
present invention and are well known in the art (Remington's Pharmaceutical 
Science, 16th Ed., Mack Ed.). For the administration of polypeptides, 
antagonists, agonists and the like, the amount administered should be chosen 
so as to avoid adverse side effects. The dosage will be adapted by the clinician 
20 in accordance with conventional factors such as the extent of the disease and 
different parameters from the patient. Typically, 0.001 to 50 mg/kg/day will be 
administered to the mammal. 

The present invention relates to a kit for assessing a 
predisposition to breast cancer comprising a determination of the genotype at 
25 the AR locus (or a locus in linkage desiquilibrium therewith) using a nucleic acid 
fragment, a protein or a ligand, or a restriction enzyme in accordance with the 
present invention. For example, a compartmentalized kit in accordance with the 
present invention includes any kit in which reagents are contained in separate 
containers. Such containers include small glass containers, plastic containers 
30 or strips of plastic or paper. Such containers allow the efficient transfer of 
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reagents from one compartment to another compartment such that the samples 
and reagents are not cross-contaminated and the agents or solutions of each 
container can be added in a quantitative fashion from one compartment to 
another. Such containers will include in one particular embodiment a container 

5 which will accept the test sample (DNA protein or cells), a container which 
contains the primers used in the assay, containers which contain enzymes, 
containers which contain wash reagents, and containers which contain the 
reagents used to detect the extension products. 

It will be readily recognized by the person of ordinary skill, 

10 that the nucleic acid sequences, probes, primers, antibodies and the like of the 
present invention enabling a detection of the CAG repeat polymorphism of the 
AR gene of the present invention can be incorporated into anyone of numerous 
established kit formats which are well known in the art. 

Other objects, advantages and features of the present 

15 invention will become more apparent upon reading of the following 
non-restrictive description of preferred embodiments which is exemplary and 
should not be interpreted as limiting the scope of the present invention. 

DESCRIPTION OF THE PREFERRED EMBODIMENT 

20 In accordance with one embodiment of the invention, 

there is provided a specific model for use in prediction of breast cancer 
susceptibility and prognosis. The model comprises an androgen receptor gene 
polymorphism that allows to identify a subset of patients (i.e.women) that are at 
significantly increased risk of breast cancer as compared to those bearing other 

25 variants of this gene. 

In accordance with a preferred embodiment of the present 
invention, a single gene, the androgen receptor gene, has been identified. The 
polymorphism of this gene is associated with a significant proportion of breast 
cancer cases in the general population (up to 60% of all cases). Polymorphism 
30 of this gene is for example the CAG repeat located in the first exon. 
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It was thus discovered in accordance with a preferred 
embodiment of the present invention that testing for this polymorphism in the 
androgen receptor (AR) gene allows to distinguish between women at lower risk 
of breast cancer and those at higher risk of the disease. 
5 The present invention will be more readily understood by 

referring to the following example which is given to illustrate the invention rather 
than to limit its scope. 

EXAMPLE 1 

Polymorphism of the CAG repeat of the androgen receptor as a marker 
for breast cancer susceptibility 

In a case control study comparing 262 consecutive cases of 
breast cancer in women and 465 control women matched for age, polymorphism 
at the AR gene, namely, the CAG repeat coding for a polyglutamine tract in the 
5' part of the AR gene located on chromosome X, was studied. Because of the 
large number of alleles identified (15 different alleles), these alleles were 
grouped arbitrarily in categories by size to simplify the analysis and increase the 
number of individuals in each category. Table 1 presents the frequency of cases 
and controls in categories of genotypes with the corresponding odds ratio for 
breast cancer and the computed 95% confidence intervals. The AR gene alleles 
were called arbitrarily A to E according to their size in CAG repeats, the shortest 
alleles being A and the longest being called E. The shortest AR gene alleles 
(corresponding to the polyglutamine stretch) or combinations of short alleles 
(AA,AB,BB) are the genotypes that show the smallest breast cancer risk. This 
shows that women with a certain combination of AR gene polymorphisms on 
their two X chromosomes have a significantly increased risk of developing 
breast cancer as compared to the category with the smallest risk. In fact, in this 
cohort 32% of all cases of breast cancer were attributable to variation in the AR 
gene. This is three to six times the number of breast cancer cases attributable 
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to the BRCA1 and BRCA2 genes. Indeed, in the cohort studied the 25% of 
women with the AR genotypes associated with the smallest risk of breast cancer - 
comprised only 19% of all breast cancer cases while the 75% of women having 
the AR genotypes associated with the highest risks of breast cancer had 81% 

5 of all breast cancer cases. In other words, as compared with the general 
population, for which the risk of breast cancer is of 1:9 women, women with 
certain AR genotypes had a risk of 1 :12 (much lower; i.e. protecting effect) while 
the other group had a risk of 1:8 (larger). Thus, this novel genetic marker of 
breast cancer allows to identify a subgroup of women with a risk of breast 

10 cancer close to two times larger than the other subgroup. 
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Table 1 

Distribution of cases and controls among 
females with various AR genotypes 



AR genotype 


Cases 


Controls 


Totals 


A* + BB 


49 


134 


183 


BC to EE 


213 


331 


544 


Totals 


262 


465 


727 



10 Odds Ratio (OR) for breast cancer in BC to EE genotypes vs A* and BB = 1.76 (95% 
confidence interval CI 1.22 to 2.55) 
Chi-square = 9.5 p=0.002 

Breast cancer risk attributable to AR gene variation = 32% (57 cases / 213 in the others 
category) 

15 

As will be clear to the skilled artisan, the different alleles AR 
alleles can be grouped differently according to size, and the invention should 
therefore not be limited to particular groupings. As will be seen in Table 2, 
groupings of the alleles in three categories instead of 5, still enable a 

20 demonstration of the significant association of the AR CAG-repeat polymorphism 
with breast cancer. 

In Table 2, the 15 different alleles were grouped in three 
different categories (X, Y, and Z) instead of five, in which the shortest alleles are 
in the X category, and the longest alleles are in the Z category. The six possible 

25 genotypes were thus designated as "XX", "XY", "XZ", "YY", "YZ, and "ZZ" 
genotypes. It is apparent from Table 2 that (CAG)n genotypes were associated 
with the disease as the genotypes with mid to large numbers of (CAG) repeats 
were at significantly higher risk of developing the disease as compared to 
genotypes with shorter (CAG)n tracts (Table 2). Table 2 shows that women with 

30 either the YY, YZ or ZZ genotypes had a 2.2-fold increased risk of breast cancer 
compared to women with the XX or XY genotype, i.e. that women with these 
later genotypes had only a 1 :20 lifetime risk for the disease as compared to a 
1 :9 risk for those with the larger genotypes. 
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Table 2 

Association of androgen receptor 
polymorphism with breast cancer 



15 





(CAG)n genotype 


XXorXY 
genotype 


XZ genotype 


YY, YZ or ZZ 
genotype 


Cases 


10(4%)* 


28(11%)* 


212(85%)* 


Controls 


37 (8%)* 


61 (13%)* 


355 (78%)* 


Odds Ratio (OR) 


1.0 


1.7 


2.2 


95% CI for OR 
(min-max) 




0.7 - 3.9 


1.1 -4.5 


Lifetime risk of 
breast cancer 


1:20 


1:12 


1:9 











* value in parenthesis represents percentage of total cases or controls 
CI Confidence interval expressed with the highest and lowest values. 



20 No significant interaction was observed between AR 

genotypes and the body mass index (BMl), smoking habits, menopausal status 
or family history of breast cancer. However, a striking combined influence of the 
AR genotype with a positive history of breast benign disease (BBD) on the risk 
of breast cancer was observed (Table 3). Women with a positive history of BBD 

25 and AR genotypes combining the large AR alleles (Y or Z) had a relative risk of 
3.5 as compared to women with no such history and AR genotypes comprised 
of smaller alleles. When compared to carriers of XX, XY AR genotypes only 
(who have the lowest risk of breast cancer) with no history of benign disease, 
women with the AR-ZZ genotype had an odds ratio of 7.1 for breast cancer 

30 (95% CI 2.3 to 22). Interestingly the AR genotype was not associated with a 
significant risk of breast cancer in women with no history of breast benign 
disease. The present invention thus also provides as an additional "marker" to 
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strengthen the prognosis/diagnosis/treatment methods and reagents according 
to the present invention, a positive history of BBD. 



Table 3 

Association of breast cancer risk with 
AR polymorphism and breast benign disease 



10 



15 







AR genotype 




XX, XY, XZ YY, YZ, ZZ 




Cases 


25 (10%)* 


131 (52%)* 




Controls 


73 (16%)* 


288 (64%)* 


Negative history 
of benign breast 
disease 


Odds Ratio 


1.0 


1.33 


95% CI for OR (min 
- max) 




0.8-2.2 


Lifetime risk of 
breast cancer 


1:16 


1:12 


Positive history 
of benign breast 
disease 


Cases 


13 (5%)* 


81 (32%)* 


Controls 


25 (6%)* 


67 (15%)* 


Odds Ratio 


1.5 


3.5 


95% CI for OR (min 
- max) 


0.7-3.4 


2.0-6.2 


Lifetime risk of 
breast cancer 


1:11 


1:4 



25 * value in parenthesis represents percentage of total cases or controls 
CI Confidence interval expressed with the highest and lowest values. 
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Up to now, no marker displaying such a large odds ratios had 
been reported for breast cancer. Furthermore, this genetic marker and 
polymorphisms in the AR gene play a very significant role in breast cancer 
susceptibility in women, as evidenced by the very significant association 
demonstrated herein. The present invention also points to alternative therapies 
for breast cancer aiming at restoring the efficacy of the AR in women with a 
reduced function of their AR genes due to the variant genotypes that they carry. 
The described assays of the present invention could enables the identification 
of such therapies. 

Although the present invention has been described 
hereinabove by way of preferred embodiments thereof, it can be modified, 
without departing from the spirit and nature of the subject invention as defined 
in the appended claims. 
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SEQUENCE LISTING 

SEQ ID NO:1 5'-TCCAGAATCTGTTCCAGAGCGTGC-3 
SEQ ID NO:2 5'-GCTGTGAAGGTTGCTGTTCCTCAT-3' 
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WHAT IS CLAIMED IS: 

1 . A method of determining an individual's predisposition to 
breast cancer, development of breast cancer and/or responsiveness to therapy 
5 for breast cancer, said method comprising the step of determining the androgen 
receptor genotype of the individual, thereby determining an individual's 
predisposition to breast cancer, development of breast cancer and/or 
responsiveness to therapy for breast cancer. 

10 2. The method of claim 1 , wherein the androgen receptor 

genotype is determined using a nucleic acid variant in linkage disequilibrium with 
a CAG repeat in an androgen receptor gene. 

3. The method of claim 2, wherein the androgen receptor 
15 genotype is determined by determining the number of CAG repeat within the 

androgen receptor gene 

4. The method of claim 3, which further comprises a step 
of amplifying a segment of the androgen receptor using polymerase chain 

20 reaction. 

5. The method of claim 4, wherein a pair of primers derived 
from a nucleic acid sequence of the androgen receptor gene or flanking said 
gene is used in the polymerase chain reaction. 

25 

6. The method of claim 5, wherein the segment of the 
androgen receptor gene is amplified using a pair of primers as follows: 
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5'-TCCAGAATCT GTTCCAGAGC GTGC-3' SEQ ID NO:1 ; and 

5'-GCTGTGAAGG TTGCTGTTCC TCAT-3' SEQ ID NO:2. 

7. An assay for screening and selecting an agent which 
5 modulates breast cancer predisposition comprising: 

a) a recombinant androgen receptor gene or functional 
fragment thereof, which comprisesthe CAG repeat thereof, or a marker in 
linkage disequilibrium therewith; and 

b) assaying a function of said androgen receptor; 

10 wherein an allele which modulates said function of an androgen receptor can be 
selected, 

and wherein a modulation of a function of said androgen receptor is associated 
with a modulation of said breast cancer predisposition, whereby short CAG 
repeat of said AR positively modulates AR receptor function, thereby leading to 
1 5 breast cancer protection. 

8. An assay for screening and selecting an agent which 
modulates breast cancer predisposition comprising: 

a) an expression vector comprising a promoter operably 
20 linked to a reporter gene, said promoter comprising an androgen response 

element, said response element affecting the activity of said promoter upon 
binding thereto of androgen or analog thereof; 

b) a cell expressing a chosen allele of an androgen receptor 
and harboring said vector of a); 

25 c) submitting said cell to at least one agent; and 

d) assaying a level of said reporter gene; 
whereby an agent can be selected when the level of said reporter gene is 
significantly modulated by the presence of said agent. 
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9. A method of using specific alleles of the androgen 
receptor gene, or a variant, equivalent, or mutation thereof which shows linkage 
disequilibrium therewith, to set-up a screening assay for agents destined to 
modulate androgen receptor function for the purpose of identifying agents 
5 involved in breast cancer. 
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5 METHODS AND COMPOSITIONS FOR PREDICTING 

PROSTATE CANCER 

This application claims priority to United States Provisional Application No 
60/221,074 filed on July 27, 2000, which application is herein incorporated by 
reference in its entirety. 

10 L BACKGROUND OF THE INVENTION 

The incidence of clinical prostate cancer differs substantially between ethnic 
groups, with African Americans having a 10- to 40-fold higher incidence than 
Asians (1-3 AR). Such disparity in incidence of clinical prostate cancer cannot be 
explained entirely by population differences in screening. An earlier study shows 

15 that after adjustment for screening, there is still a 3- to 4-fold difference in incidence 
rates between U.S. and Japanese men, whose rates are among the highest in Asians 
(4AR). Despite the dramatic racial variation in clinical prostate cancer incidence, 
the prevalence of latent carcinoma appears to be similar across populations (5 AR), 
suggesting that there exists differences in factors (either genetic or environmental) 

20 that promote the progression of microscopic tumors to clinically overt carcinoma. 

The growth, differentiation, and proliferation of prostatic cells are regulated 
by androgens (6AR). The biological effects of androgens are mediated through 
binding to the intracellular androgen receptor (AR), which in turn regulates the 
transcription of target genes with the assistance of transcriptional coactivators 
25 (7AR). 

II. SUMMARY OF THE INVENTION 

In accordance with the purposes of this invention, as embodied and broadly 
described herein, this invention, in one aspect, relates compositions and methods for 
assessing prostate cancer risk. 

30 Additional advantages of the invention will be set forth in part in the 

description which follows, and in part will be obvious from the description, or may 
be learned by practice of the invention. The advantages of the invention will be 
realized and attained by means of the elements and combinations particularly 
pointed out in the appended claims. It is to be understood that both the foregoing 

35 general description and the following detailed description are exemplary and 
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5 explanatory only and are not restrictive of the invention, as claimed. 

ffl.BRIEF DESCRIPTION OF THE DRAWINGS 

The accompanying drawings, which are incorporated in and constitute a part 
of this specification, illustrate several embodiments of the invention and together 
with the description, serve to explain the principles of the invention. 

10 Figure 1. (AR) shows the percent distribution of the number of CAG repeats. 

IV. DETAILED DESCRIPTION 

The present invention may be understood more readily by reference to the 
following detailed description of preferred embodiments of the invention and the 
Examples included therein and to the Figures and their previous and following 
15 description. 

Before the present compounds, compositions, articles, devices, and/or 
methods are disclosed and described, it is to be understood that this invention is not 
limited to specific synthetic methods, specific recombinant biotechnology methods 
unless otherwise specified, or to particular reagents unless otherwise specified, as 
20 such may, of course, vary. It is also to be understood that the terminology used 
herein is for the purpose of describing particular embodiments only and is not 
intended to be limiting. 

A. Definitions 

As used in the specification and the appended claims, the singular forms "a," 
25 "an" and "the" include plural referents unless the context clearly dictates otherwise. 
Thus, for example, reference to "a pharmaceutical carrier" includes mixtures of two 
or more such carriers, and the like. 

Ranges may be expressed herein as from "about" one particular value, and/or 
to "about" another particular value. When such a range is expressed, another 
30 embodiment includes from the one particular value and/or to the other particular 
value. Similarly, when values are expressed as approximations, by use of the 
antecedent "about," it will be understood that the particular value forms another 
embodiment. It will be further understood that the endpoints of each of the ranges 
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5 are significant both in relation to the other endpoint, and independently of the other 
endpoint. 

In this specification and in the claims which follow, reference will be made 
to a number of terms which shall be defined to have the following meanings: 

"Optional" or "optionally" means that the subsequently described event or 
10 circumstance may or may not occur, and that the description includes instances 
where said event or circumstance occurs and instances where it does not. 

"Primers" are a subset of probes which are capable of supporting some type 
of enzymatic manipulation and which can hybridize with a target nucleic acid such 
that the enzymatic manipulation can occur. A primer can be made from any 
15 combination of nucleotides or nucleotide derivatives or analogs available in the art 
which do not interfere with the enzymatic manipulation. 

"Probes" are molecules capable of interacting with a target nucleic acid, 
typically in a sequence specific manner, for example through hybridization. The 
hybridization of nucleic acids is well understood in the art and discussed herein. 
20 Typically a probe can be made from any combination of nucleotides or nucleotide 
derivatives or analogs available in the art. 

B. Compositions 

Disclosed are the components to be used to prepare the disclosed 
compositions as well as the compositions themselves and to be used within the 

25 methods disclosed herein. These and other materials are disclosed herein, and it is 
understood that when combinations, subsets, interactions, groups, etc. of these 
materials are disclosed that while specific reference of each various individual and 
collective permutation of these compounds may not be explicitly disclosed, each is 
specifically contemplated and described herein. For example, if a particular AD31 

30 protein or gene is disclosed and discussed and a number of modifications that can be 
made to a number of molecules including the AIB1 protein or gene are discussed, 
specifically contemplated is each and every combination and permutation of AIB 1 
protein or gene and the modifications that are possible unless specifically indicated 
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5 to the contrary. Thus, if a class of molecules A, B, and C are disclosed as well as a 
class of molecules D, E, and F and an example of a combination molecule, A-D is 
disclosed, then even if it each is not individually recited each is individually and 
collectively contemplated meaning combinations, A-E, A-F, B-D, B-E, B-F, C-D, C- 
E, and C-F are considered disclosed. Likewise, any subset or combination of these 

10 is also disclosed. Thus, for example, the sub-group of A-E, B-F, and C-E would be 
considered disclosed. This concept applies to all aspects of this application 
including, but not limited to, steps in methods of making and using the disclosed 
compositions. Thus, if there are a variety of additional steps that can be performed it 
is understood that each of these additional steps can be performed with any specific 

15 embodiment or combination of embodiments of the disclosed methods. 

It is understood that one way to define any known variants and derivatives or 
those that might arise, of the disclosed genes and proteins herein is through defining 
the variants and derivatives in terms of homology to specific known sequences. For 
example, SEQ ID NO:18 sets forth a particular sequence of an AIB1 gene and SEQ 

20 ID NO: 19 sets forth a particular sequence of the protein encoded by SEQ ID NO: 18, 
an AIB1 protein. Specifically disclosed are variants of these and other genes and 
proteins herein disclosed which have at least, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 
80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 percent 
homology to the stated sequence. Those of skill in the art readily understand how to 

25 determine the homology of two proteins or nucleic acids, such as genes. For 

example, the homology can be calculated after aligning the two sequences so that the 
homology is at its highest level. 

Another way of calculating homology can be performed by published 
algorithms. Optimal alignment of sequences for comparison may be conducted by 
30 the local homology algorithm of Smith and Waterman Adv. Appl. Math. 2: 482 
(1981), by the homology alignment algorithm of Needleman and Wunsch, J. MoL 
Biol. 48: 443 (1970), by the search for similarity method of Pearson and lipman, 
Proc. Natl. Acad. Sci. U.S.A. 85: 2444 (1988), by computerized implementations of 
these algorithms (GAP, BESTFTT, FASTA, and TFASTA in the Wisconsin Genetics 
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5 Software Package, Genetics Computer Group, 575 Science Dr., Madison, WI), or by 
inspection. 

The same types of homology can be obtained for nucleic acids by for 
example the algorithms disclosed in Zuker, M. Science 244:48-52, 1989, Jaeger et 
al. Proc. Natl Acad. Set USA 86:7706-7710, 1989, Jaeger et aL Methods Enzymol 
10 183:281-306, 1989 which ate herein incorporated by reference for at least material 
related to nucleic acid alignment. 

The term hybridization typically means a sequence driven interaction 
between at least two nucleic acid molecules, such as a primer or a probe and a gene. 
Sequence driven interaction means an interaction that occurs between two 

15 nucleotides or nucleotide analogs or nucleotide derivatives in a nucleotide specific 
manner. For example, G interacting with C or A interacting with T are sequence 
driven interactions. Typically sequence driven interactions occur on the Watson- 
Crick face or Hoogsteen face of the nucleotide. The hybridization of two nucleic 
acids is affected by a number of conditions and parameters known to those of skill in 

20 the art. For example, the salt concentrations, pH, and temperature of the reaction all 
affect whether two nucleic acid molecules will hybridize. 

Parameters for selective hybridization between two nucleic acid molecules 
are well known to those of skill in the art. For example, in some embodiments 
selective hybridization conditions can be defined as stringent hybridization 

25 conditions. For example, stringency of hybridization is controlled by both 

temperature and salt concentration of either or both of the hybridization and washing 
steps. For example, the conditions of hybridization to achieve selective 
hybridization may involve hybridization in high ionic strength solution (6X SSC or 
6X SSPE) at a temperature that is about 12-25°C below the Tm (the melting 

30 temperature at which half of the molecules dissociate from their hybridization 
partners) followed by washing at a combination of temperature and salt 
concentration chosen so that the washing temperature is about 5°C to 20°C below 
the Tm. The temperature and salt conditions are readily determined empirically in 
preliminary experiments in which samples of reference DNA immobilized on filters 
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5 are hybridized to a labeled nucleic acid of interest and then washed under conditions 
of different stringencies. Hybridization temperatures are typically higher for DNA- 
RNA and RNA-RNA hybridizations. The conditions can be used as described above 
to achieve stringency, or as is known in the art. (Sambrook et al., Molecular 
Cloning: A Laboratory Manual, 2nd Ed., Cold Spring Harbor Laboratory, Cold 

10 Spring Harbor, New York, 1989; Kunkel et al. Methods Enzymol. 1987:154:367, 
1987 which is herein incorporated by reference for material at least related to 
hybridization of nucleic acids). A preferable stringent hybridization condition for a 
. DNA:DNA hybridization can be at about 68°C (in aqueous solution) in 6X SSC or 
6X SSPE followed by washing at 68°C. Stringency of hybridization and washing, if 

15 desired, can be reduced accordingly as the degree of complementarity desired is 

decreased, and further, depending upon the G-C or A-T richness of any area wherein 
variability is searched for. Likewise, stringency of hybridization and washing, if 
desired, can be increased accordingly as homology desired is increased, and further, 
depending upon the G-C or A-T richness of any area wherein high homology is 

20 desired, all as known in the art. 

Another way to define selective hybridization is by looking at the amount 
(percentage) of one of the nucleic acids bound to the other nucleic acid. For 
example, in some embodiments selective hybridization conditions would be when at 
least, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 71, 72, 73, 74, 75, 76, 77, 

25 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 
100 percent of the limiting nucleic acid is bound to the non-limiting nucleic acid. 
Typically, the non-limiting primer is in for example, 10 or 100 or 1000 fold excess. 
This type of assay can be performed at under conditions where both the limiting and 
non-limiting primer are for example, 10 fold or 100 fold or 1000 fold below their ka, 

30 or where only one of the nucleic acid molecules is 10 fold or 100 fold or 1000 fold 
or where one or both nucleic acid molecules are above their kd. 

Another way to define selective hybridization is by looking at the percentage 
of primer that gets enzymatically manipulated under conditions where hybridization 
is required to promote the desired enzymatic manipulation. For example, in some 
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5 embodiments selective hybridization conditions would be when at least, 5, 10, 15, 
20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 
82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 percent of 
the primer is enzymatically manipulated under conditions which promote the 
enzymatic manipulation, for example if the enzymatic manipulation is DNA 

10 extension, then selective hybridization conditions would be when at least 5, 10, 15, 
20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 
82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 percent of 
the primer molecules are extended. Preferred conditions also include those 
suggested by the manufacturer or indicated in the art as being appropriate for the 

15 enzyme performing the manipulation. 

1, AIB1/SRC3 

The AIB1 protein (Amplified In Breast cancer 1, also known as Steroid 
Receptor Coactivator-3, or SRC-3), encoded by the AIB1/SRC-3 gene located on 
chromosome 20 (20ql2), is an AR coactivator and a member of the steroid receptor 
20 coactivator family, which interacts with members of the nuclear hormone receptor 
family (14,15 AIB1). Like the AR protein, the AIB1/SRC-3 coactivator contains a 
stretch of glutamine residues encoded by a variable-length track of CAG/CAA 
repeats in the AIB1/SRC-3 gene. 

a) Sequences 

25 There are a variety of sequences related to the AIB1 gene having the 

following Genbank Accession Numbers: NTJ)11362, XM„030033, XM_030032, 
XMJ)09483, XML030031, XM 030034, AL353777, AL034418, AL021394, 
AY008258, AF322224, NMJX)8679, NMJ)06534, AF012108, and AF044080, 
these sequences and others are herein incorporated by reference in their entireties as 

30 well as for individual subsequences contained therein. 

One particular sequence set forth in SEQ ID NO: 18 and having Genbank 
accession number XM_030D32 is used herein, as an example, to exemplify the 
disclosed compositions and methods. It is understood that the description related to 
this sequence is applicable to any sequence related to AIB1 unless specifically 
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5 indicated otherwise. Those of skill in the art understand how to resolve sequence 
discrepancies and differences and to adjust the compositions and methods relating to 
a particular sequence to other related sequences (i.e. sequences of AIB1). Primers 
and/or probes can be designed for any AIB1 sequence given the information 
disclosed herein and known in the art. 

10 b) CAG/CAA region 

The CAG/CAA region of the ABB1 gene is located in the coding region of 
the AIB1 gene. For example, the CAG/CAA region in SEQ ID NO:18 can be 
defined by the region from nucleotide 3930 to nucleotide 4016. In certain 
embodiments of the disclosed compositions and methods, this represents the 

15 CAG/CAA region of the AIB 1 gene. However, it is understood that various 

mutations, alterations, or other genetic variation including allelic variation can occur 
in certain individuals and those of skill in the art understand how to locate this 
region within any given A1B1 gene variant. Thus, for example, if a nucleic acid is 
amplified from SEQ ID NO: 18 or any variant of the ADB1 gene, that contains only 

20 only a fragment of the AD31 gene, for example, a fragment of 1000 nucleotides, it is 
understood that the CAG/CAA region could be located within this molecule if it is 
included in the molecule, in whole or in part. 

c) Primers and probes 
Disclosed aie compositions including primers and probes, which are capable 

25 of interacting with the AIB 1 gene as disclosed herein. In certain embodiments the 
primers are used to support DNA amplification reactions. Typically the primers will 
be capable of being extended in a sequence specific manner. Extension of a primer 
in a sequence specific manner includes any methods wherein the sequence and/or 
composition of the nucleic acid molecule to which the primer is hybridized or 

30 otherwise associated directs or influences the composition or sequence of the 
product produced by the extension of the primer. Extension of the primer in a 
sequence specific manner therefore includes, but is not limited to, PCR, DNA 
sequencing, DNA extension, DNA polymerization, RNA transcription, or reverse 
transcription. Techniques and conditions that amplify the primer in a sequence 
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5 specific manner are preferred. In certain embodiments the primers are used for the 
DNA amplification reactions, such as PCR or direct sequencing. It is understood 
that in certain embodiments the primers can also be extended using non-enzymatic 
techniques, where for example, the nucleotides or oligonucleotides used to extend 
the primer are modified such that they will chemically react to extend the primer in a 

10 sequence specific manner. Typically the disclosed primers hybridize with the AIB 1 
gene or region of the AIB1 gene or they hybridize with the complement of the AIB1 
gene or complement of a region of the AIB1 gene. 

Disclosed are primers that are capable of amplifying the CAG/CAA region of 
the AIB1 gene. In certain embodiments the primers amplify the CAG/CAA region 

15 of the AIB 1 gene from nucleotide 3930 to nucleotide 4016 of the sequence set forth 
in SEQ ID NO: 18. In certain embodiments the primers are "outside" the CAG/CAA 
region of the AIB1 gene. By outside the region it is intended to indicate that no 
region of the primer is intended to interact directly with the CAG/CAA region. For 
example, a primer outside the CAG/CAA region of the sequence set forth in SEQ ID 

20 NO:18 could hybridize with nucleotide 4017 to nucleotide 4037 of SEQ ID NO:18, 
but it would not be intended to hybridize with nucleotide 4016, and under the 
conditions designed for the enzymatic manipulation of the primer, it would not 
appreciably interact with nucleotide 4016. In other embodiments the primers are 
designed to interact with one or more nucleotides considered to be part of the 

25 CAG/CAA region, which primers herein are referred to as "inside" primers. Thus, 
an inside primer can include a primer region that can under conditions appropriate 
for the desired enzymatic manipulation hybridize with one or more nucleotides 
considered within the CAG/CAA region and contain a region that hybridizes with 
nucleotides that are not considered part of the CAG/CAA region. Thus, in one 

30 embodiment an inside primer would interact with the nucleotide at position 4016 of 
SEQ ID NO: 18. 

The size of the primers for interaction with the AIB 1 gene in certain 
embodiments can be any size that supports the desired enzymatic manipulation of 
the primer, such as DNA amplification. A typical AIB1 primer would be at least 6, 
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5 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 
30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 
74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 
96, 97, 98, 99, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425, 

10 450, 475, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1250, 1500, 1750, 
2000, 2250, 2500, 2750, 3000, 3500, or 4000 nucleotides long. 

In other embodiments an AIBl primer can be less than or equal to 6, 7, 8, 9, 
10, 11, 12 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 
15 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 
76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 
98, 99, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 
475, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1250, 1500, 1750, 
2000, 2250, 2500, 2750, 3000, 3500, or 4000 nucleotides long. 

20 In certain embodiments the primers are designed such that they are outside 

primers whose nearest point of interaction with the AIBl gene is within 0, 1, 2, 3, 4, 
5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 

25 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 
95, 96, 97, 98, 99, 100, 125, 150, 175, or 200 nucleotides of the outermost defining 
nucleotide of the CAG/CAA region or complement of the CAG/CAA region. 

Li certain embodiments the primers are designed such that they are outside 
primers whose nearest point of interaction with the AIBl gene is at least 0, 1, 2, 3, 4, 
30 5, 6, 7, 8, 9, 10, 11, 12 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 
73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 
95, 96, 97, 98, 99, 100, 125, 150, 175, or 200 nucleotides away from the outermost 
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5 defining nucleotide of the CAG/CAA region or complement of the CAG/CAA 
region. 

For example, with respect to the ADB1 gene set forth in SEQ ID NO: 18, 
certain embodiments of the primers would be designed such that they are outside 
primers whose nearest point of interaction with the AH31 gene occurs at position 

10 4017, 4018, 4019, 4020, 4021, 4022, 4023, 4024, 4025, 4026, 4027, 4028, 4029, 
4030, 4031, 4032, 4033, 4034, 4035, 4036, 4037, 4038, 4039, 4040, 4041, 4042, 
4043, 4044, 4045, 4046, 4047, 4048, 4049, 4050, 4051, 4052, 4053, 4054, 4055, 
4056, 4057, 4058, 4059, 4060, 4061, 4062, 4063, 4064, 4065, 4066, 4067, 4068, 
4069, 4070, 4071, 4072, 4073, 4074, 4075, 4076, 4077, 4078, 4079, 4080, 4081, 

15 4082, 4083, 4084, 4085, 4086, 4087, 4088, 4089, 4090, 4091, 4092, 4093, 4094, 
4095, 4096, 4097, 4098, 4099, 4100, 4101, 4102, 4103, 4104, 4105, 4106, 4107, 
4108, 4109, 4110, 4111, 4112, 4113, 4114, 4115, 4116, 4117, 4142, 4167, 4192, 
4217ofSEQIDNO:18. 

The primers for the AIB 1 gene typically will be used to produce an amplified 
20 DNA product that contains the CAG/CAA region of the AIB1 gene. In general, 
typically the size of the product will be such that the size can be accurately 
determined to within 3, or 2 or 1 nucleotides. 

In certain embodiments this product is at least 20, 21, 22, 23, 24, 25, 26, 27, 
28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 
25 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 
72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 
94, 95, 96, 97, 98, 99, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 
400, 425, 450, 475, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1250, 
1500, 1750, 2000, 2250, 2500, 2750, 3000, 3500, or 4000 nucleotides long. 

30 In other embodiments the product is less man or equal to 20, 21 , 22, 23, 24, 

25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 
47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 
69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 
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5 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 
350, 375, 400, 425, 450, 475, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 
1000, 1250, 1500, 1750, 2000, 2250, 2500, 2750, 3000, 3500, or 4000 nucleotides 
long. 

2» Androgen receptor 

10 Androgen receptor (AR) plays a key role in intraprostatic androgenic action. 

Within the prostate gland, testosterone is converted into dihydrotestosterone (DHT), 
a more potent androgen. DHT then binds to the AR to form an intracellular DHT- 
AR complex, which in turn modulates prostatic target genes to induce proliferation. 

The AR protein, consisting of 918 amino acids and encoded singly by the AR 
15 gene located on the X chromosome (Xql 1-12), has three major functional domains: 
a transactivating amino-terminal domain, a DNA binding domain, and a ligand 
(steroid) binding domain (8AR). The open reading frame of the AR gene is 
separated over eight exons and has a length of 2,730 base pair (bp). The sequence 
encoding the large amino-terminal transactivating domain is found in the first exon; 
20 the DNA binding domain is encoded by exons 2 and 3; and the information for the 
ligand binding domain is distributed over exons 4 to 8 (8AR). 

The first exon of the AR gene contains two polymorphic trinucleotide repeat 
segments that encode polyglutamine and polyglycine tracts localized in the N- 
terminal transactivation domain of the AR protein. The polyglutamine tract is 

25 encoded by a C AG trinucleotide repeat, and the polyglycine stretch by a GGN 
repeat. The number of CAG repeats ranges from about 8 to 35 repeats in normal 
individuals. Longer CAG repeat lengths appear to result in reduced AR 
transcriptional activity both in vivo and in vitro (9,10AR). Otherwise healthy men 
whose androgen receptor has a CAG repeat length at the long end of the normal 

30 range (>28) have an increased incidence of impaired spermatogenesis and infertility 
(11AR), conditions that are extremely androgen-dependent (12AR). Expansion of 
the CAG repeat length to over 40 repeats is related to a rare neuromuscular disorder, 
spinal and bulbar muscular atrophy (Kennedy syndrome), which is also associated 
with androgen insensitivity, decreased virilization, testicular atrophy, reduced sperm 
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5 production, and infertility (13-15AR). Together, these clinical data suggest that a 
longer CAG repeat length decreases the functional competence of AR- 

The length of the polyglycine (GGN) tract varies from about 10 to 30 repeats. 
The functional consequences of variation in the GGN tract are less clear. Deletion 
of the polyglycine tract reduces AR transcriptional activity by about 30% in transient 
10 transfection assays (16AR), although there is no significant correlation between 
polyglycine tract length and infertility (11AR). 

a) Sequences 

There are a variety of sequences related to the androgen receptor gene having 
the following Genbank Accession Numbers: NM_000044 or 149399 are two 
15 examples. These sequences and others available on Genbank are herein 

incorporated by reference in their entireties as well as for individual subsequences 
contained therein. 

One particular sequence set forth in SEQ ID NO:20 and having Genbank 
accession number NM_000044 is used herein, as an example, to exemplify the 

20 disclosed compositions and methods. It is understood that the description related to 
this sequence is applicable to any sequence related to androgen receptor unless 
specifically indicated otherwise. Those of skill in the art understand how to resolve 
sequence discrepancies and differences and to adjust the compositions and methods 
relating to a particular sequence to other related sequences. Primers and/or probes 

25 can be designed for any androgen receptor sequence given the information disclosed 
herein and known in the art. 

CAG/CAA region 

The CAG/CAA region in SEQ ID NO:20 can be defined by the region from 
nucleotide 1286 to nucleotide 1348. In certain embodiments of the disclosed 
30 compositions and methods, this represents the CAG/CAA region of the androgen 
receptor gene. However, it is understood that various mutations, alterations, or other 
genetic variation including allelic variation can occur in certain individuals and 
those of skill in the art understand how to locate this region within any given 
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5 androgen receptor gene variant. Thus, for example, if a nucleic acid is amplified 
from SEQ ID NO:20 or any variant of the androgen receptor gene, that contains only 
1000 nucleotides, it is understood that the CAG/CAA region could be located within 
this molecule if it is included in the molecule, in whole or in part. 

c) GGN region 

10 The GGN region in SEQ ID NO:20 can be defined by the region from 

nucleotide 2459 to nucleotide 2530. In certain embodiments of the disclosed 
compositions and methods, this represents the GGN region of the androgen receptor 
gene. However, it is understood that various mutations, alterations, or other genetic 
variation including allelic variation can occur in certain individuals and those of skill 

15 in the art understand how to locate this region within any given androgen receptor 
gene variant. Thus, for example, if a nucleic acid is amplified from SEQ ID NO:20 
or any variant of the androgen receptor gene, that contains only 1000 nucleotides, it 
is understood that the GGN region could be located within this molecule if it is 
included in the molecule, in whole or in part. 

20 ch Primers and probes 

Disclosed are compositions including primers and probes, which are capable 
of interacting with the androgen receptor gene as disclosed herein. In certain 
embodiments the primers are used to support DNA amplification reactions. 
Typically the primers will be capable of being extended in a sequence specific 

25 manner. Extension of a primer in a sequence specific manner includes any methods 
wherein the sequence and/or composition of the nucleic acid molecule to which the 
primer is hybridized or otherwise associated directs or influences the composition or 
sequence of the product produced by the extension of the primer. Extension of the 
primer in a sequence specific manner therefore includes, but is not limited to, PCR, 

30 DNA sequencing, DNA extension, DNA polymerization, RNA transcription, or 
reverse transcription. Techniques and conditions that amplify the primer in a 
sequence specific manner are preferred. In certain embodiments the primers are 
used for the DNA amplification reactions, such as PCR or direct sequencing. It is 
understood that in certain embodiments the primers can also be extended using non- 
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5 enzymatic techniques, where for example, the nucleotides or oligonucleotides used 
to extend the primer are modified such that they will chemically react to extend the 
primer in a sequence specific manner. Typically the disclosed primers hybridize 
with the androgen receptor gene or region of the androgen receptor gene or they 
hybridize with the complement of the androgen receptor gene or complement of a 

10 region of the androgen receptor gene. 

Disclosed are primers that are capable of amplifying the CAG/CAA and/or 
GGN region of the androgen receptor gene. In certain embodiments the primers 
amplify the CAG/CAA region of the androgen receptor gene from nucleotide 1286 
to nucleotide 1348 of the sequence set forth in SEQ ID NO:20. In certain 

15 embodiments the primers are "outside" the CAG/CAA region of the androgen 

receptor gene. By outside the region it is intended to indicate that no region of the 
primer is intended to interact directly with the CAG/CAA region. For example, a 
primer outside the CAG/CAA region of the sequence set forth in SEQ ID NO:20 
could hybridize with nucleotide 1349 to nucleotide 1369 of SEQ ID NO:20, but it 

20 would not be intended to hybridize with nucleotide 1348, and under the conditions 
designed for the enzymatic manipulation of the primer, it would not appreciably 
interact with nucleotide 1348. In other embodiments the primers are designed to 
interact with one or more nucleotides considered to be part of the CAG/CAA region, 
which primers herein are referred to as "inside" primers. Thus, an inside primer can 

25 include a primer region that can under conditions appropriate for the desired 

enzymatic manipulation hybridize with one or more nucleotides considered within 
the CAG/CAA region and contain a region that hybridizes with nucleotides that are 
not considered part of the CAG/CAA region. Thus, in one embodiment an inside 
primer would interact with the nucleotide at position 1348 of SEQ ID NO:20. 

30 In certain embodiments the primers amplify the GGN region of the androgen 

receptor gene from nucleotide 2459 to nucleotide 2530 of the sequence set forth in 
SEQ ID NO:20. In certain embodiments the primers are "outside" the GGN region 
of the androgen receptor gene. By outside the region it is intended to indicate that 
no region of the primer is intended to interact directly with the GGN region. For 
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5 example, a primer outside the GGN region of the sequence set forth in SEQ ID 
NO:20 could hybridize with nucleotide 2531 to nucleotide 2551 of SEQ ID NO:20, 
but it would not be intended to hybridize with nucleotide 2530, and under the 
conditions designed for the enzymatic manipulation of the primer, it would not 
appreciably interact with nucleotide 2530. In other embodiments the primers are 

10 designed to interact with one or more nucleotides considered to be part of the GGN 
region, which primers herein are referred to as "inside" primers. Thus, an inside 
primer can include a primer region that can under conditions appropriate for the 
desired enzymatic manipulation hybridize with one or more nucleotides considered 
within the GGN region and contain a region that hybridizes with nucleotides that are 

15 not considered part of the GGN region. Thus, in one embodiment an inside primer 
would interact with the nucleotide at position 2530 of SEQ ID NO:20. 

The size of the primers for interaction with the androgen receptor gene in 
certain embodiments can be any size that supports the desired enzymatic 
manipulation of the primer, such as DNA amplification. A typical androgen 

20 receptor primer would be at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 
43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 
65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 
87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 125, 150, 175, 200, 225, 250, 

25 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 550, 600, 650, 700, 750, 800, 850, 
900, 950, 1000, 1250, 1500, 1750, 2000, 2250, 2500, 2750, 3000, 3500, or 4000 
nucleotides long. 

In other embodiments an androgen receptor primer can be less than or equal 
to 6, 7, 8, 9, 10, 11, 12 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 
30 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 
73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 
95, 96, 97, 98, 99, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 
425, 450, 475, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1250, 1500, 
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5 1750, 2000, 2250, 2500, 2750, 3000, 3500, or 4000 nucleotides long. 

In certain embodiments the primers are designed such that they are outside 
primers whose nearest point of interaction with the androgen receptor gene is within 
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 
10 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 
70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 
92, 93, 94, 95, 96, 97, 98, 99, 100, 125, 150, 175, or 200 nucleotides of the 
outermost defining nucleotide of the CAG/CAA and/or GGN region or complement 
of the CAG/CAA and/or GGN region. 

15 In certain embodiments the primers are designed such that they are outside 

primers whose nearest point of interaction with the androgen receptor gene is at least 
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 
48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 

20 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 
92, 93, 94, 95, 96, 97, 98, 99, 100, 125, 150, 175, or 200 nucleotides away from the 
outermost defining nucleotide of the CAG/CAA and/or GGN region or complement 
of the CAG/CAA and/or GGN region. 

For example, with respect to the androgen receptor gene set forth in SEQ ID 
25 NO:20, certain embodiments of the primers would be designed such that they are 
outside primers whose nearest point of interaction with the androgen receptor gene 
occurs at position 1349, 1350, 1351, 1352, 1353, 1354, 1355, 1356, 1357, 1358, 
1359, 1360, 1361, 1362, 1363, 1364, 1365, 1366, 1367, 1368, 1369, 1370, 1371, 
1372, 1373, 1374, 1375, 1376, 1377, 1378, 1379, 1380, 1381, 1382, 1383, 1384, 
30 1385, 1386, 1387, 1388, 1389, 1390, 1391, 1392, 1393, 1394, 1395, 1396, 1397, 
1398, 1399, 1400, 1401, 1402, 1403, 1404, 1405, 1406, 1407, 1408, 1409, 1410, 
1411, 1412, 1413, 1414, 1415, 1416, 1417, 1418, 1419, 1420, 1421, 1422, 1423, 
1424, 1425, 1426, 1427, 1428, 1429, 1430, 1431, 1432, 1433, 1434, 1435, 1436, 
1437, 1438, 1439, 1440, 1441, 1442, 1443, 1444, 1445, 1446, 1447, 1448, 1449, 
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1474, 1499, 1524, or 1549 of SEQ ID NO:20. 

The primers for the androgen receptor gene typically will be used to produce 
an amplified DNA product that contains the CAG/CAA and/or GGN region of the 
androgen receptor gene. In general, typically the size of the product will be such that 
the size can be accurately determined to within 3, or 2 or 1 nucleotides. 

In certain embodiments this product is at least 20, 21, 22, 23, 24, 25, 26, 27, 
28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 
50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 
72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 
94, 95, 96, 97, 98, 99, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 
400, 425, 450, 475, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1250, 
1500, 1750, 2000, 2250, 2500, 2750, 3000, 3500, or 4000 nucleotides long. 

In other embodiments the product is less than or equal to 20, 21, 22, 23, 24, 
25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 
47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 
69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 
91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 
350, 375, 400, 425, 450, 475, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 
1000, 1250, 1500, 1750, 2000, 2250, 2500, 2750, 3000, 3500, or 4000 nucleotides 
long. 

3. Kits 

Disclosed herein are kits that are drawn to reagents that can be used in 
practicing the methods disclosed herein. The kits can include any reagent or 
combination of reagent discussed herein or that would be understood to be required 
or beneficial in the practice of the disclosed methods. For example, the kits could 
include primers to perform the amplification reactions discussed in certain 
embodiments of the methods, as well as the buffers and enzymes required to use the 
primers as intended. For example, disclosed is a kit for assessing a subject's risk for 
acquiring prostate cancer, comprising the oligonucleotides set forth in SEQ ID Nos: 
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5 land2. 

C. Methods of making the compositions 

The compositions disclosed herein and the compositions necessary to 
perform the disclosed methods can be made using any method known to those of 
skill in the art for that particular reagent or compound unless otherwise specifically 

10 noted. For example, the nucleic acids, such as, the oligonucleotides to be used as 
primers can be made using standard chemical synthesis methods or can be produced 
using enzymatic methods or any other known method. Such methods can range 
from standard enzymatic digestion followed by nucleotide fragment isolation (see 
for example, Sambrook et at, Molecular Cloning: A Laboratory Manual, 2nd 

15 Edition (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989) 
Chapters 5, 6) to purely synthetic methods, for example, by the cyanoethyl 
phosphoramidite method using a Milligen or Beckman System lPlus DNA 
synthesizer (for example, Model 8700 automated synthesizer of Milligen-Biosearch, 
Burlington, MA or ABI Model 380B). Synthetic methods useful for making 

20 oligonucleotides are also described by Hcuta et ah, Ann. Rev, Biochem. 53:323-356 
(1984), (phosphotriester and phosphite-triester methods), and Narang et aZ., Methods 
EnzymoL, 65:610-620 (1980), (phosphotriester method). Protein nucleic acid 
molecules can be made using known methods such as those described by Nielsen et 
al^Bioconjug. Chem. 5:3-7 (1994). 

25 D. Methods of using the compositions 

It is understood that any variation or implementation of the compositions 
discussed herein is understood to be a particular embodiment which can be used and 
is described as being used in the disclosed methods. Disclosed are methods for 
assessing the risk that a subject may acquire prostate cancer. Methods are also 

30 disclosed for assessing the clinical significance of the prostate cancer that a subject 
will get or already has. These methods involve assaying the length of regions of 
genes that have been shown herein to be related to the onset and severity of prostate 
cancer. The methods involve assaying regions of both the AIB1 gene and the AR 
gene either individually or collectively. There are particular regions of the AIB1 
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5 gene and the AR gene which have been shown to be related to the onset and severity 
of prostate cancer and these regions are found in all forms and variants of these 
genes. In one embodiment of the disclosed methods the length of the CAG/CAA 
repeat in either the AIB1 gene or AR gene of a subject, discussed above, is 
determined. This length can then be compared to known lengths, which have been 

10 shown to be correlated with the onset and severity of prostate cancer. For example, 
in the ADB1 gene, the CAG/CAA region determined to be greater than, less than, or 
equal to 29 repeats. When the number of repeats present in the repeat region of the 
subject is less than or more than 29 at least one allele of the subject's AIB1 gene, 
this person is determined to have an increased risk of developing prostate cancer. It 

15 is understood that since the AIB1 gene is present on chromosome 20, subjects will 
have two alleles of this gene. When a single allele has a length different than 29 the 
subject has an increased risk of acquiring prostate cancer. If, however, the subject 
has two alleles of the AIB1 gene that both have lengths in the CAG/CAA repeat 
region different than 29, the subject has a higher risk of acquiring prostate cancer 

20 than if only one allele has a CAG/CAA length different than 29. Thus, if the alleles 
of the AJB1 gene are assayed individually, if the first allele assayed has a CAG/CAA 
repeat length different than 29, a certain amount of information as to the person's 
susceptibility will be gained, however, if the allele contains a CAG/CAA repeat 
equal to 29, the second allele must be looked at to get meaningful information as to 

25 the person's likelihood of acquiring prostate cancer. Furthermore, more information 
will be gained by assaying the second allele even if the first allele has a CAG/CAA 
length less than or greater than 29 because the second allele may also have a 
CAG/CAA length different 29 and this provides more information about the 
subject's prostate cancer susceptibility. Thus, it is preferred that both alleles of the 

30 AIB1 gene are assayed when performing the methods herein. 

The CAG/CAA region of the subject's AR gene can also be assayed. 
However, in the AR length the length of the subject's CAG/CAA region in the AR 
gene is compared to 23 repeats. If there are more or less than 23 repeats the subject 
has an increased risk of prostate cancer. Furthermore, the risk of prostate cancer in a 
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5 subject is directly correlated with the length of the subject's CAG/CAA repeat. The 
greater the deviation of the number of CAG/CAA repeats in a subjects DNA from 23 
repeats the greater the likelihood that the person will acquire prostate cancer. 

1, Methods assaying the CAG/CAA repeat of the AIB1 gene 

Methods are disclosed where only the length of CAG/CAA repeat in a 
10 subject's AIB1 gene is assayed. Disclosed are methods for assessing the risk of 
developing prostate cancer in a human by analyzing the AIB 1 gene of the subject. 

Disclosed are methods for assessing the risk of prostate cancer in a human 
subject comprising determining the length of the contiguous CAG or CAA repeats in 
both AIB1 gene alleles of the subject and assessing whether the length of the CAG 
15 or CAA repeats is less than, equal to, or greater than 29 repeats, a length less than or 
greater than 29 repeats in both alleles indicating an increased risk of prostate cancer 
in the subject. 

Also disclosed are methods where determining the length of the repeats 
comprises amplifying a region of both AEBl gene alleles comprising the contiguous 
20 CAG or CAA repeat Any type of amplification method can be employed to amplify 
the target regions. Preferred methods include PCR and direct sequencing of the 
target repeat regions. 

When two repeat regions of two alleles are amplified two products will be 
produced. These products can then be assayed for their length, by for example, gel 
25 chromotagraphy, HPLC, or capillary gel electrophoresis. Thus, disclosed are 

methods that produce two PCR products and methods where the PCR products are 
analyzed by chromatography, including but not limited to gel electrophoresis. 

In certain embodiments the sequence of the repeat region can be determined 
by, for example, direct sequencing of the repeat region. In some embodiments the 
30 PCT product or other DNA amplification product, can itself be sequenced. 

In certain embodiments of the disclosed methods, the repeat region is 
assayed or amplified by targeting primers or probes to the region or the areas of the 
target gene surrounding the region. These primers, for example, can be used to 
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5 amplify the region by for example, PGR. It is preferred that the method utilize 
primers as they are discussed herein. For example, the PCR product can be 
produced using a first AIB1 primer that selectively hybridizes with the complement 
of sequence 5' to the repeat region and a second AIB1 primer that selectively 
hybridizes with sequence 3 1 to the repeat region. 

10 In certain embodiments, the PCR product can be produced using a first AIB 1 

primer that selectively hybridizes with the sequence set forth in SEQ ID NO:3 and a 
second AIB1 primer that selectively hybridizes with the sequence set forth in SEQ 
ID NO:4. In still other embodiments of the disclosed methods, the first AIB1 primer 
has the sequence set forth in SEQ ID NO: 1 and the second AIB 1 primer has the 

15 sequence set forth in SEQ ID NO:2. 

In certain embodiments determining how many CAG or CAA repeats there 
are comprises sequencing the CAG or CAA repeats. 

2. Methods assaying the CAG/CAA repeat of the Androgen 
receptor gene 

20 Just as with the AIB 1 gene, there are repeat regions within the androgen 

receptor gene that can be assayed and used to assess a person's risk for acquiring 
prostate cancer. There are two repeat regions that can be assayed within the 
androgen receptor gene. The CAG/CAA repeat region within the AR gene can be 
assayed as discussed for the CAG/CAA repeat region of the AIB1 gene. There is a 

25 key difference, however, and that is the length of the subject's CAG/CAA repeat 
region in the AR gene is compared to 23 repeats, not 29 repeats. The predictability 
of acquiring prostate cancer arises from a comparison to 23 CAG/CAA repeats. 
Furthermore, the CAG/CAA repeat with in the AR gene is predictive based on single 
repeat changes. Thus, the likelihood of acquiring prostate cancer increases for each 

30 difference in length from 23 repeats. Thus, for example, a subject having 15 repeats 
is more likely to acquire prostate cancer or have a clinically significant prostate 
cancer than a person having 16 repeats and a subject having 16 repeats is more likely 
to acquire prostate cancer or have a clinically significant prostate cancer than a 
person having 17 repeats and a subject having 17 repeats is more likely to acquire 
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5 prostate cancer or have a clinically significant prostate cancer than a person having 
16 repeats and so forth. This relationship holds for all lengths of CAG/CAA repeats 
in the AR gene. 

The AR gene also has a GGN repeat region whose length is related to the 
susceptibility of prostate cancer. The methods and reagents disclosed for the 
10 CAG/CAA repeats of the AR gene apply to the GGN repeat within the AR gene and 
are all specifically contemplated herein. 

3. Methods assaying the CAG/CAA repeat of the Androgen 
receptor gene and the AIB1 gene 
Methods are also disclosed where both the ADB1 gene and the AR gene are 
15 assayed to arrive at a likelihood that a subject will acquire prostate cancer. It is 
understood that all of the conditions and permutations of performing the methods 
that involve assaying either ADB1 or AR alone can also be used and applied when 
both AIB1 and AR are assayed. 

Disclosed are methods for assessing the risk of developing prostate cancer in 
20 a human by analyzing the ACB 1 gene and the AR gene of the subject together. 

For example, disclosed are methods for assessing the risk of prostate cancer 
in a human subject comprising determining the length of the contiguous C AG or 
CAA repeats in the ABl gene alleles of the subject and assessing whether the 
length of the CAG or CAA repeats in each allele is less than, equal to, or greater 

25 than 29 repeats, and determining the length of the contiguous CAG or CAA repeats 
in the androgen receptor gene of the subject and assessing whether the length of the 
CAG or CAA repeats is less than, equal to, or greater than 23 repeats, a length in at 
least one allele less than or greater than 29 repeats in the ABl gene and less than or 
greater than 23 repeats in the jandrogen receptor gene indicating an increased risk of 

30 prostate cancer in the subject. 

Also disclosed are methods wherein determining the length of the contiguous 
CAG or CAA repeats in the AIB1 gene alleles comprises amplifying a region of the 
AD31 gene alleles comprising the CAG or CAA repeats and wherein determining the 
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5 length of the C AG or CAA repeats in the androgen receptor gene comprises 
amplifying a region of the androgen receptor gene comprising the CAG or CAA 
repeats. 

Furthermore, methods are disclosed wherein the amplification of the regions 
of the AIM gene and the androgen receptor gene is by PCR that produces a first and 
10 a second A1B 1 PCR product and an androgen receptor PCR product. 

Disclosed are methods further comprising analyzing the PCR products by 
chromatography and methods wherein the chromatography is gel electrophoresis. 
The sequence of the PCR products can be determined. 

The methods involving amplifying repeat regions in both AIB1 and the AR 
15 genes can for example involve primers. It is understood that primers specific for 
each repeat region would be used. For example, the AIB1 PCR product is produced 
using a first AIB1 primer that selectively hybridizes with the sequence set forth in 
SEQ ID NO:3 and a second AIB1 primer that selectively hybridizes with the 
sequence set forth in SEQ ID NO:4. Additional methods are wherein the first ADB1 
20 primer has the sequence set forth in SEQ ID NO: 1 and the second AIB 1 primer has 
the sequence set forth in SEQ ID NO:2. 

Or when assaying both genes the androgen receptor PCR product is produced 
using a first androgen receptor CAG primer that selectively hybridizes with the 
sequence set forth in SEQ ID NO:9 and a second androgen receptor CAG primer 
25 that selectively hybridizes with the sequence set forth in SEQ ID NO: 10. In some 
embodiments the first androgen receptor CAG primer has the sequence set forth in 
SEQ ID NO:7 and the second androgen receptor CAG primer has the sequence set 
forth in SEQ ID NO:8. 

Or when both the AIM and the AR genes are assayed, the AIM PCR 
30 product is produced using a first AIB 1 primer that selectively hybridizes with the 
sequence set forth in SEQ ID NO:3 and a second AIM primer that selectively 
hybridizes with the sequence set forth in SEQ ID NO:4. and wherein the androgen 
receptor PCR product is produced using a first androgen receptor CAG primer that 
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5 selectively hybridizes with the sequence set forth in SEQ ID NO:9 and a second 
androgen receptor CAG primer that selectively hybridizes with the sequence set 
forth in SEQ roNO:10. 

In certain embodiments the first AIB1 primer has the sequence set forth in 
SEQ ID NO:l and the second AIBl primer has the sequence set forth in SEQ ID 
10 NO:2 and wherein the first androgen receptor CAG primer has the sequence set forth 
in SEQ ID NO:7 and the second androgen receptor CAG primer has the sequence set 
forth in SEQ ID NO:8. 

In some embodiments the methods are performed wherein more than 29 
contiguous CAG or CAA repeats in at least one allele of the AIBl gene of the 
15 person and more than 23 contiguous CAG or CAA repeats in the androgen receptor 
gene of the person indicates an increased risk of prostate cancer in the subject. 

In other embodiments the methods are performed wherein more than 29 
contiguous CAG or CAA repeats in at least one allele of the AIBl gene of the 
person and less than 23 contiguous CAG or CAA repeats in the androgen receptor 
20 gene of the person indicates an increased risk of prostate cancer in the subject. 

In some embodiments the methods are performed wherein less than 29 
contiguous CAG or CAA repeats in at least one allele of the AIBl gene of the 
person and more than 23 contiguous CAG or CAA repeats in the androgen receptor 
gene of the person indicates an increased risk of prostate cancer in the subject 

25 In other embodiments the methods are performed wherein less than 29 

contiguous CAG or CAA repeats in at least one allele of the AIB 1 gene of the 
person and less than 23 contiguous CAG or CAA repeats in the androgen receptor 
gene of the person indicates an increased risk of prostate cancer in the subject. 

In other embodiments the methods are performed wherein more or less than 
30 29 contiguous CAG or CAA repeats in both alleles of the AIB 1 gene of the person 
and more than 23 contiguous CAG or CAA repeats in the androgen receptor gene of 
the person indicates an increased risk of prostate cancer in the subject. 
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5 In other embodiments the methods are performed wherein more or less than 

29 contiguous CAG or CAA repeats in both alleles of the AIB1 gene of the person 
and less than 23 contiguous CAG or CAA repeats in the androgen receptor gene of 
the person indicates an increased risk of prostate cancer in the subject. 

In other embodiments the methods are performed wherein more than 29 
10 contiguous CAG or CAA repeats in both alleles of the AIB1 gene of the person and 
more than 23 contiguous CAG or CAA repeats in the androgen receptor gene of the 
person indicates an increased risk of prostate cancer in the subject. 

In other embodiments the methods are performed wherein less than 29 
contiguous CAG or CAA repeats in both alleles of the AIB1 gene of the person and 
15 less than 23 contiguous CAG or CAA repeats in the androgen receptor gene of the 
person indicates an increased risk of prostate cancer in the subject. 

Also disclosed are methods for assessing the risk of prostate cancer in a 
human subject comprising determining the length of the contiguous CAG or CAA 
repeats in the AIB1 gene alleles of the subject and assessing whether the length of 

20 the CAG or CAA repeats in each allele is less than, equal to, or greater than 29 

repeats, and determining the length of the contiguous GGN repeats in the androgen 
receptor gene of the subject and assessing whether the length of the GGN repeats is 
less than, equal to, or greater than 23 repeats, a length in at least one allele less than 
or greater than 29 repeats in the AIB1 gene and less than or greater than 23 repeats 

25 in the androgen receptor gene indicating an increased risk of prostate cancer in the 
subject, wherein N is either T, G, or C. 

Disclosed are methods wherein detennining the length of the contiguous 
CAG or CAA repeats in the AB 1 gene alleles comprises amplifying a region of the 
AIB1 gene alleles comprising the CAG or CAA repeats and wherein determining the 
30 length of the contiguous GGN repeats in the androgen receptor gene comprises 
amplifying a region of the androgen receptor gene comprising the GGN repeats. 

Further methods are disclosed wherein the amplification of the regions of the 
AIB1 gene and the androgen receptor gene is by PCR that produces a first and a 
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5 second AIB 1 PCR product and an androgen receptor PCR product. 

In other methods the amplification further comprises analyzing the PCR 
products by chromatography. 

The amplification products including the PCR products can be analyzed by 
gel electrophoresis. 

10 In other embodiments the sequence of the PCR products is determined. 

Disclosed are amplification methods wherein the AIB1 PCR product is 
produced using a first AIB1 primer that selectively hybridizes with the sequence set 
forth in SEQ ID NO:3 and a second AIB 1 primer that selectively hybridizes with the 
sequence set forth in SEQ ID NO:4. 

15 Also disclosed are amplification methods wherein the first AIB1 primer has 

the sequence set forth in SEQ ID NO: 1 and the second AIB 1 primer has the 
sequence set forth in SEQ ID NO:2. 

Disclosed are methods wherein the androgen receptor PCR product is 
produced using a first androgen receptor GGN primer that selectively hybridizes 
20 with the sequence set forth in SEQ ID NO:13 and a second androgen receptor GGN 
primer that selectively hybridizes with the sequence set forth in SEQ ID NO:14. 

Also disclosed are methods wherein the first androgen receptor GGN primer 
has the sequence set forth in SEQ ID NO: 11 and the second androgen receptor GGN 
primer has the sequence set forth in SEQ ID NO: 12. 

25 Further disclosed are methods wherein the AIB1 PCR product is produced 

using a first AIB1 primer that selectively hybridizes with the sequence set forth in 
SEQ ID NO:3 and a second AIB1 primer that selectively hybridizes with the 
sequence set forth in SEQ ID NO:4. and wherein the androgen receptor PCR product 
is produced using a first androgen receptor GGN primer that selectively hybridizes 

30 with the sequence set forth in SEQ ID NO: 13 and a second androgen receptor GGN 
primer that selectively hybridizes with the sequence set forth in SEQ ID NO: 14. 

Also disclosed are methods wherein the first AIB1 primer has the sequence 
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5 set forth in SEQ ID NO:l and the second AIB1 primer has the sequence set forth in 
SEQ ID NO:2 and wherein the first androgen receptor GGN primer has the sequence 
set forth in SEQ ID NO:l 1 and the second androgen receptor GGN primer has the 
sequence set forth in SEQ ID NO:12. 

In some embodiments the methods are performed wherein more than 29 
10 contiguous C AG or CAA repeats in at least one allele of the AIB 1 gene of the 

person and more than 23 contiguous GGN repeats in the androgen receptor gene of 
the person indicates an increased risk of prostate cancer in the subject. 

In other embodiments the methods are performed wherein more than 29 
contiguous CAG or CAA repeats in at least one allele of the AIB1 gene of the 
15 person and less than 23 contiguous GGN repeats in the androgen receptor gene of 
the person indicates an increased risk of prostate cancer in the subject. 

In some embodiments the methods are performed wherein less than 29 
contiguous CAG or CAA repeats in at least one allele of the AE31 gene of the 
person and more than 23 contiguous GGN repeats in the androgen receptor gene of 
20 the person indicates an increased risk of prostate cancer in the subject. 

In other embodiments the methods are performed wherein less than 29 
contiguous CAG or CAA repeats in at least one allele of the AIB1 gene of the 
person and less than 23 contiguous GGN repeats in the androgen receptor gene of 
the person indicates an increased risk of prostate cancer in the subject. 

25 In some embodiments the methods are performed wherein more or less than 

29 contiguous CAG or CAA repeats in both alleles of the AIB1 gene of the person 
and more than 23 contiguous GGN repeats in the androgen receptor gene of the 
person indicates an increased risk of prostate cancer in the subject. 

In other embodiments the methods are performed wherein more or less than 
30 29 contiguous CAG or CAA repeats in both alleles of the AIB1 gene of the person 
and less than 23 contiguous GGN repeats in the androgen receptor gene of the 
person indicates an increased risk of prostate cancer in the subject. 
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5 In other embodiments the methods are performed wherein more than 29 

contiguous CAG or CAA repeats in both alleles of the AIB1 gene of the person and 
more than 23 contiguous GGN repeats in the androgen receptor gene of the person 
indicates an increased risk of prostate cancer in the subject. 

In other embodiments the methods are performed wherein less than 29 
10 contiguous CAG or CAA repeats in both alleles of the AIB 1 gene of the person and 
less than 23 contiguous GGN repeats in the androgen receptor gene of the person 
indicates an increased risk of prostate cancer in the subject. 

Also disclosed are methods for assessing the risk of prostate cancer in a 
human subject comprising determining the length of the contiguous CAG or CAA 
15 repeats in an AIB 1 gene of the subject and assessing whether the length of the CAG 
repeats is less than, equal to, or greater than 29 repeats, a length less than or greater 
than 29 repeats indicating an increased risk of prostate cancer in the subject. 

Methods for assessing the risk of prostate cancer in a human subject 
comprising determining the length of the contiguous CAG or CAA repeats in an 

20 AIB 1 gene of the subject and assessing whether the length of the CAG or CAA 

repeats is less than, equal to, or greater than 29 repeats, and determining the length 
of the contiguous CAG or CAA repeats in the androgen receptor gene of the subject 
and assessing whether the length of the CAG or CAA repeats is less than, equal to, 
or greater than 23 repeats, a length less than or greater than 29 repeats in the AIB1 

25 allele and less than or greater than 23 repeats in the androgen receptor gene 
indicating an increased risk of prostate cancer in the subject are also disclosed. 

Disclosed are methods for assessing the risk of prostate cancer in a human 
subject comprising determining the length of the contiguous CAG or CAA repeats in 
an AIB1 gene of the subject and assessing whether the length of the CAG or CAA 
30 repeats is less than, equal to, or greater than 29 repeats, and determining the length 
of the contiguous GGN repeats in the androgen receptor gene of the subject and 
assessing whether the length of the GGN repeats is less than, equal to, or greater 
than 23 repeats, a length less than or greater than 29 repeats in the AIB1 allele and 
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5 less than or greater than 23 repeats in the androgen receptor gene indicating an 
increased risk of prostate cancer in the subject, wherein N is either T, G, or C. 

Throughout this application, various publications are referenced. The 
disclosures of these publications in their entireties are hereby incorporated by 
reference into this application in order to more fully describe the state of the art to 
10 which this invention pertains. The references disclosed are also individually and 
specifically incorporated by reference herein for the material contained in them that 
is discussed in the sentence in which the reference is relied upon. 

It will be apparent to those skilled in the art that various modifications and 
variations can be made in the present invention without departing from the scope or 
1 5 spirit of the invention. Other embodiments of the invention will be apparent to those 
skilled in the art from consideration of the specification and practice of the invention 
disclosed herein. It is intended that the specification and examples be considered as 
exemplary only, with a true scope and spirit of the invention being indicated by the 
following claims. 

20 E. Examples 

The following examples are put forth so as to provide those of 

ordinary skill in the art with a complete disclosure and description of how the 

compounds, compositions, articles, devices and/or methods claimed herein are made 

and evaluated, and arc intended to be purely exemplary of the invention and are not 

25 intended to limit the scope of what the inventors regard as their invention. Efforts 

have been made to ensure accuracy with respect to numbers (e.g., amounts, 

temperature, etc.), but some errors and deviations should be accounted for. Unless 

indicated otherwise, parts are parts by weight, temperature is in °C or is at ambient 

temperature, and pressure is at or near atmospheric. 

30 1. Example 1 

Because AR coactivators enhance transactivation of AR, the relationship of a 
CAG/CAA repeat length polymorphism in the AIB1/SRC-3 gene (amplified in breast 
cancer gene 1, a steroid receptor coactivator and an AR coactivator) with prostate 
cancer risk in a multidisplinary population-based case-control study in China was 
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5 evaluated. Genomic DNA of 189 prostate cancer patients and 301 healthy controls 
was used for the PCR-based assay. The AIB1/SRC-3 CAG/CAA repeat length for all 
alleles ranged from 24 to 32, with the most common repeat length being 29. 
Homozygous 29/29 and heterozygous 28/29 were the most common genotypes, with 
44% and 30% of the controls harboring these genotypes, respectively. Relative to 

10 subjects homozygous for >29 CAG repeats (>29/>29 genotype), individuals carrying 
one shorter allele £29/<29 genotype) had a 26% increased risk (OR = 1.26, 95% CI 
= 0.85-1.86), while those homozygous for the shorter allele (<29/<29 genotype) had 
a 73% excess risk (OR = 1.73, 95% CI = 0.96-3.11). The combined effect of CAG 
repeat lengths in the AR and AIB1/SRC-3 genes was also evaluated. Relative to men 

15 with both the £29/£29 genotype of the AIB1/SRC-3 gene and a long CAG repeat 
length (>23) in the AR gene, those with both the <29/<29 MB1/SRC-3 genotype and 
a short CAG repeat length in the AR gene (<23) had a 2.9-fold risk (OR = 2.86, 95% 
CI = 1.28-6.40). A similar was seen for the combined effects of the AIB1/SRC-3 
marker with the GGN marker of the AR gene. Together, these data indicate that the 

20 CAG/CAA repeat length in the AIB1/SRC-3 gene is associated with prostate cancer 
risk in Chinese men and that the combination of CAG/CAA repeat lengths in both 
the AIB1/SRC-3 and AR genes provide a useful marker for clinically significant 
prostate cancer. 

a) Materials and methods 

25 (1^ Study subjects 

Details of the study have been described previously (9,16-18 AIB1) which 
are herein are incorporated by reference for at least material related to the 
epidemiological study . Cases of primary prostate cancer (ICD9 185) newly 
diagnosed between 1993 and 1995 were identified through a rapid reporting system 

30 established between the Shanghai Cancer Institute and 16 collaborating hospitals in 
urban Shanghai. Cases were permanent residents in 10 urban districts of Shanghai 
(henceforth referred to as Shanghai) who had no history of other cancer. Contrary to 
many Western countries, prostate cancer screening is not widespread in China; 
therefore cases in this study were clinically significant prostate cancers who 
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5 presented with symptoms. 

Based on the personal registry cards of all adults over age 18 residing in 
urban Shanghai (maintained at the Shanghai Resident Registry), male population 
controls were selected randomly from the 6.5 million permanent residents of 
Shanghai and frequency-matched to the expected age distribution (5-year category) 
10 of prostate cancer cases. Included controls were negative for prostate cancer based 
on digital rectal exam and trans-rectal ultrasound. 

Information on potential risk factors was elicited through an in-person 
interview by trained interviewers using a structured questionnaire. The interview 
included information on demographic characteristics; dietary and smoking history; 

15 consumption of alcohol and other beverages; medical history; family history of 
cancer; physical activity; body size; and sexual behavior. Of the 268 eligible cases 
(95% of the cases diagnosed in Shanghai during the study period), 243 (91%) were 
interviewed. After a consensus review by both the Chinese and American 
pathologists, four cases were classified as having benign prostatic hyperplasia and 

20 excluded from the study. Of the 495 eligible controls, 472 (95%) were interviewed. 
Most non-responses were due to refusal. 

b) Blood collection and DNA extraction 
Two hundred cases (82% of those interviewed) and 330 controls (70%) 

provided 20 ml of fasting blood for the study. The blood samples were processed at 
25 a central laboratory in Shanghai. The buffy coat samples were first stored at -70°C 

and then shipped to the U.S. in dry ice for DNA extraction at the American Type 

Culture Collection (Manassas, VA), using a standard DNA extraction protocol. 

Quality control procedures showed no evidence of contamination, and DNA purity 

and length were satisfactory. After DNA extraction, 190 cases and 305 controls had 
30 sufficient DNA for genotyping. DNA samples were arranged in case-control 

pairs/triplets to minimize day-to-day laboratory variation, and laboratory personnel 

were masked to case-control status. 
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5 c) Genotypinq 

The AIB1/SRC-3 gene. The polyglutanrine region of the AIB1/SRC-3 
protein is encoded by two glutamine codons in the AIB1/SRC-3 gene on 
chromosome 20 (GenBank accession number AF012108): CAG and CAA. The 
usual sense codon sequence of the polyglutanrine stretch is (CAG) n CAA (CAG) n 

10 (CAA CAG) 4 CAG CAA (C AG) 2 CAA) SEQ ID NO:5. The two variable-length 
tracks of CAG repeats ((C AG)n) usually contain six repeats between nucleotides 
3930 and 3947, and nine repeats between nucleotides 395 1 and 3977, for a total ■ 
repeat length of 29 (19 AD31) (which is herein incorporated by reference for material 
related to the CAG repeat.) This polymorphism has previously been described by 

15 Shirazi et al., though while Shirazi et al. scored genotypes of this marker using only 
the two variable (CAG) n stretches (19 AIB1), herein the data was scored as the total 
number of continuous CAG and CAA triplets in the entire polyglutanrine region of 
the AW1/SRC-3 gene, as has been done more recently (20,21 AIB1) (both of which 
are herein incorporated by reference for material related to the CAG and CAA 

20 repeats). 

The number of CAG/CAA repeats in the polyglutanrine stretch of the 
AIB1/SRC-3 gene was determined by amplifying the gene's C-terminal 
polyglutanrine region in each sample using custom flanking primers (5- 
TCATCACCTCCGACAACAGAGG-3' and 5' (SEQ ID NO:l> 

25 TATGGAAACTGTTGCGGAGGAG-3 1 (SEQ ID NO:2) and the Advantage 2 

Polymerase System (Clontech). The number of CAG/CAA repeats was determined 
by electrophoresis of the PCR products on an acrylamide gel and comparison to 
molecular weight standards. For confirmation, PCR products from selected samples 
were subsequently purified, using the PCR Product Purification Kit (Qiagen), and 

30 sequenced directly, using the Big Dye Terminator Cycle Sequencing Ready Reaction 
Kit (PE Biosystems). 

The AR gene. Genotypes of both the CAG (polyglutanrine) and GGN 
(polyglycine) repeat length polymorphisms in exon 1 of the AT? gene (located on the 
X chromosome) were determined as described previously (22 AIB1) (which is 
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5 herein incorporated by reference for material related to the AR gene). Briefly, two 
sets of oligonucleotide primers flanking each of the two polymorphic regions for use 
in DNA amplification and direct sequencing were designed. For the polyglutamine 
stretch, the number of continuous CAG or CAA triplets was counted directly, while 
for the polyglycine stretch, the number of continuous GGN repeats (where N 

10 represents T, G, or C) was counted directly 

Quality control Because the PCR procedure is prone to contamination, a 
negative water-blank control was always included in each batch of the PCR 
reactions (usually 9-18 DNA samples plus one negative control). If the negative 
control was shown to be positive, the assay was repeated for the entire batch. 

15 Twenty-four split samples from a single individual were spaced at intervals among 
the study samples to assess the reproducibility of genotyping. For the AIB1/SRC-3 
gene, all of the 24 split samples had a CAG/CAA repeat length of 29 in both alleles. 
Of the 21 split samples with AR CAG results, 19 (90%) had the same repeat number 
(repeat length-23); one had one more, and one had one less repeat. Of the 20 

20 samples with AR GGN results, 19 (95%) had the same repeat number (repeat 
length=23) and one had one less repeat. 

Statistical Analysis, Unconditional logistic regression models were used to 
derive odds ratios (ORs) and corresponding 95% confidence intervals (CIs) to 
estimate the prostate cancer risks associated with AIB1/SRC-3 genotypes (23 AIB1) 

25 (which is herein incorporated by reference for material related to statistical analysis). 
The distribution of the number of the CAG/CAA repeat lengths among controls 
were used to derive the median cutoffs used to calculate the ORs. Because the 
AIB1/SRC-3 gene is located on chromosome 20, each individual carries two alleles. 
In contrast, since the AR gene is located on the X chromosome, there is only one 

30 allele for each individual. AR CAG and the AR GGN polymorphisms in this Chinese 
population (9 AIB1), subjects in the current AIB1/SRC-3 analysis were grouped by 
<23 versus >23 repeats in analyses stratified by each of the two AR gene 
polymorphisms. The level of significance for all results reported herein is 0.05. 
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5 d) Results 

Age at diagnosis ranged from 50 to 94 (median 73) for cancer cases. Due to 
the lack of widespread prostate cancer screening in China, cases in this study were 
mostly men with clinically significant prostate cancer. Accordingly, about two- 
thirds of the cases were diagnosed as having advanced (regional/remote stages) 

10 cancer, and most tumors were moderately or poorly differentiated. Most cases were 
symptomatic at diagnosis, and 77% had serum prostatic specific antigen levels 
greater than 10 ng/ml (median 87 ng/ml). Compared to population controls, cases 
had significantly higher caloric intake; had significantly larger waist-to-hip ratios; 
and were somewhat less likely to be married, have attended college, or be smokers 

15 or drinkers, though not significantly so (data not shown). 

The distribution of the alleles and genotypes of the AIB1/SRC-3 gene 
CAG/CAA repeat length marker by case-control status is shown in Table 1 . Among 
controls, the CAG/CAA repeat length for all alleles ranged from 24 to 32, with 29, 
28, and 26 being the most common repeat lengths (65.6, 23.5, and 6.5%, 

20 respectively). Eighty-eight percent of the controls had at least one 29 allele. 

Homozygous 29/29 and heterozygous 28/29 were the most common genotypes, with 
44% and 30% of the controls harboring these genotypes, respectively. The observed 
genotype frequencies were in close agreement with those predicted from the allele 
frequencies under Hardy-Weinberg equilibrium. 

25 Table 1. Frequencies of AIB1/SRC-3 CAG/CAA repeat length alleles and 

genotypes in prostate cancer cases and controls, China 
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2 
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67 


35.3 
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43.6 


29/30 


7 


3.7 


10 


3.4 


29/31 


0 


0.0 


1 


0.3 


29/32 


2 


1.1 


0 


0.0 


Total 


190 




298 





Relative to men homozygous for 29 CAG/CAA repeats in the AIB1/SRC-3 
gene (29/29 genotype), subjects homozygous for the 28 allele had a significant risk 
increase (OR = 2.12, 95% CI = 1.09-4.12, Table 2). Subjects with the 28/29 
genotype had a non-significant increased risk relative to the 29/29 genotype (OR = 
10 1.30, 95% CI = 0.83-2.03), as did men with one 29 allele and one 24, 26, or 27 allele 
(OR = 1.33, 95% CI = 0.72-2.47) and men with one 29 allele and one 30, 31, or 32 
allele (OR = 1.58, 95% CI= 0.62-4.01). 

Table 2. Age-adjusted ORs for prostate cancer in relation to CAG/CAA repeat 
lengths in the AIB1/SRC-3 gene, China 



Allele 1 


Allele 2 


Cases 


Control 


OR a 


95% CI 


29 


29 


67 


130 


1.00 




28 


29 


59 


88 


1.30 


0.83-2.03 


24,26,27 


29 


22 


32 


1.33 


0.72-2.47 


30,31,32 


29 


9 


11 


1.58 


0.62-4.01 


28 


28 


23 


21 


2.12 


1.09-4.12 


26,27,28 


30,31 


5 


7 


1.38 


0.42-4.52 



WO 02/10452 PCT/US01/23834 

37 

26,27 26,27,28 5 9 1.08 0.35-3.35 

£29 £29 76 141 1.00 

£29 <29 86 127 1.26 0.85-1.86 

<29 <29 28 30. 1.73 0.96-3.11 



5 

Based on the median CAG/CAA repeat length of 29, the various AIB1/SRC-3 
alleles were collapsed into those with 29 or more repeats (£29) and those with less 
than 29 repeats (<29). Relative to men with the £29/£29 genotype, men with one 
long and one short allele (£29/<29 genotype) had a moderate but non-significant risk 
10 elevation (OR = 1.26, 95% CI = 0.85-1.86). Those with two short alleles (<29/<29 
genotype)' had a non-significant 73% increased risk (OR = 1.73, 95% CI = 0.96- 
3.11) relative to men with two long alleles (£29/£29 genotype). 

Risks of prostate cancer associated with various repeat lengths in both the 
AIB1/SRC-3 gene CAG/CAA polymorphism as well as the two polymorphisms of 

15 the AR gene are shown in Table 3. Relative to men both homozygous for the long 
CAG/CAA allele (£29/£29 genotype) of the AIB1/SRC-3 gene and having a long AR 
CAG repeat length (>23 repeats), men both homozygous for the short AIB1/SRC-3 
CAG/CAA allele (<29/<29) and having a short AR CAG repeat length (<23) had a 
significant 2.9-fold risk (OR=2.86, 95% = 1.28-6.40). Similarly, those men both 

20 homozygous for the short A30S1/SRC-3 CAG/CAA allele and having a short GGN 
repeat length (<23) in the AR gene had a non-significant 2.3-fold risk (OR=2.29, 
95% CI = 0.73-7.15) relative to men both homozygous for long AIB1/SRC-3 
CAG/CAA allele (£29te29 genotype) and having a long AR GGN repeat length (>23 
repeats). 
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Table 3. ORs a for prostate cancer in relation to AIB/SRC-3 CAG/CM and AR 
CAG or GGN repeat lengths, China 

CAG/CAA repeat length genotypes of the AIB1/SRC-3 gene 



AR >29fe29 £29/<29 <29/<29 

polymorphis 
ms 



nl/n2 b OR 95% CI ^lM2 b OR 95% CI nl/n2 b OR 95% Q 

CAG repeat 
length 0 

>23 30/68 1.00 - 35/69 1.15 0.63-2.08 9/14 1.46 0.57-3.75 

<23 45/71 1.43 0.81-2.54 51/58 1.99 1.12-3.53 19/15 2.86 1.28-6.40 

GGN repeat 
length 0 

>23 57/11 1.00 - 72/99 1.42 0.92-2.21 19/23 1.62 0.81-3.22 

2 

<23 17/26 1.28 0.64-2.56 14/24 1.15 0.55-2.40 7/6 2.29 0.73-7.15 
a Adjustedforage. 

b nl=number of cases, n2=number of controls. 
10 c Median number of repeats among controls was used for the cutoff. 

There was no correlation between the repeat lengths in the AR and 
AIB1/SRC-3 genes. In addition, the number of CAG/CAA repeats in the AIB1/SRC- 
3 gene did not correlate with education, body mass index, waist-to-hip ratio, total 
caloric intake, serum levels of sex hormones (testosterone; DHT; 5<x-androstane- 
15 3ct,17P-diol glucuronide; and estradiol), and sex hormone binding globulin. These 
variables therefore were not included in the logistic model for adjustment. In 
addition, odds ratios were materially unchanged when the analysis was stratified by 
clinical stage (localized versus advanced stage disease, data not shown). 

Results from this population-based case-control study in China indicate that 
20 men with fewer than 29 CAG/CAA repeats in the AIB1/SRC-3 alleles have an 
increased risk of clinically significant prostate cancer. Furthermore, our results 
suggest that this effect, though independent of AR genotypes, is more pronounced 
among men with a smaller number of AR CAG repeats. 

The observed association with CAG/CAA repeat length in the AIB1/SRC-3 
25 gene is biologically plausible. Data from transient transfection studies show that the 
AB1/SRC-3 coactivator enhances AR transcriptional activity in the presence of 
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5 DHT (12 AIB1), suggesting that the AD31/SRC-3 coactivator, in conjunction with 
AR, may increase androgenic activity within the prostate gland. Amplification of 
the AIB1/SRC-3 gene has been implicated in the etiology of several other hormone- 
dependent cancers as well, including breast and ovarian cancers (24 AIB1). 
Furthermore, recent clinical data suggest that overexpression of AR in prostate 

10 tumors may contribute to hormone sensitivity and tumor progression (25 ABB 1). 

Racial/ethnic variation in the AIB1/SRC-3 CAG/CAA repeat length mirrors 
the risk patterns of prostate cancer in high- and low-risk populations (19,26 AIB1), 
thus indirectly supporting a role of AIB1/SRC-3 in prostate cancer etiology. In a 
small survey of 112 African Americans, 19 Chinese, and 18 Caucasians, the allele 
15 frequency of 29 CAG/CAA repeats was 61%, 76%, and 58%, respectively. 

Chinese men have a longer mean C AG repeat length in the AR gene than 
Western men, and that a shorter AR CAG repeat length was associated with an 
increased risk of prostate cancer in this low-risk population (9 AJB 1). The observed 
association with CAG/CAA repeat length in the AIB1/SRC-3 gene is independent of 
20 the AR gene: regardless of CAG repeat length in the AR gene, men homozygous for 
the shorter AIB1/SRC-3 allele (the <29/<29 genotype) had a higher risk than those 
homozygous for the long allele £29/>29 genotype). However, the risk associated 
with the homozygous <29/<29 AIB1/SRC-3 genotype was more pronounced among 
those with the short AR CAG repeat length. 

25 2. Example 2. Relationship of CAG length in the androgen 

receptor 

The length of the polymorphic CAG trinucleotide repeat in the 
polyglutamine region of the androgen receptor (AR) gene is inversely correlated 
with the transactivation function of the AR. A population-based case-control assay 
30 in China addressed CAG and other polymorphisms of the AR gene and their 

association with clinically significant prostate cancer in this low-risk population. 
Genomic DNA from 190 prostate cancer patients and 304 healthy controls were used 
for direct sequencing to evaluate the relationship of CAG and GGN (polyglycine) 
repeat length in the AR gene. Relative to western men, the subjects had a longer 
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5 CAG repeat length, with a median of 23 and only 10% of the subjects having a CAG 
repeat length shorter than 20. Men with a CAG repeat length shorter than 23 
(median length) had a 65% increased risk of prostate cancer (OR=1.65, 95% CI 
1.14-2.39), compared to men with a CAG repeat length of 23 or longer. 

For the GGN tract (00X3000x00X20000, based on the sequencing results 
10 from 481 samples, it is shown that even though GGC regions in the polyglycine tract 
are highly variable, there are no mutations or polymorphisms in the GGT and GGG 
regions. Seventy two percent of the subjects had a GGN repeat length of 23, and 
those with a GGN repeat length shorter than 23 had a 12% increased risk of prostate 
cancer (95% CI 0.71-1.78), compared to those with 23 or more GGN repeats. This 
15 not only confirms that Chinese men do have a longer CAG repeat length than 
western men, but also represents the first population assay to show that even in a 
very low-risk population, a shorter CAG repeat length confers a higher risk of 
clinically significant prostate cancer. These results indicate that CAG repeat length 
can serve as a useful marker to identify a subset of individuals at higher risk of 
20 developing clinically significant prostate cancer. 

a) Materials and methods 
Subjects. Details of the assay have been described previously (28AR which 
is herein incorporate by reference for material related to the assay). Briefly, cases of 
primary prostate cancer (ICD9 185) newly diagnosed between 1993 and 1995 were 

25 identified through a rapid reporting system established between the Shanghai Cancer 
Institute (SCI) and 28 collaborating hospitals in urban Shanghai. Cases were 
permanent residents in 10 urban districts of Shanghai (henceforth referred to as 
Shanghai) who had no history of any other cancer. Of the 268 eligible cases 
(representing 95% of the cases diagnosed in urban Shanghai during the study 

30 period), 243 (91%) were interviewed in person by trained interviewers. Four of the 
cases were later classified as having benign prostatic hyperplasia and excluded from 
the study after a consensus review by both Chinese and U.S. pathologists. 

Based on the records at the Shanghai Resident Registry, which contains 
personal identification cards for all adult residents over age 18 in urban Shanghai, 
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5 healthy subjects who were free of cancer were selected randomly from permanent 
residents of Shanghai (6.5 million), frequency-matched to the expected age 
distribution (5-year category) of prostate cancer cases. Of the 495 eligible controls 
without a history of cancer, 472 (95%) were interviewed. 

Information on potential risk factors was elicited through an in-person 
10 interview by trained interviewers using a structured questionnaire. The interview 
included information on demographic characteristics; dietary history; smoking 
history; consumption of alcohol and other beverages; medical history; family history 
of cancer; physical activity; body size; and sexual behavior. 

Blood collection and DNA extraction. Two hundred cases (84% of those 
15 interviewed) and 330 controls (70%) provided 20 ml of fasting blood for the study. 
The blood samples were processed within three hours of collection at a central 
laboratory in Shanghai and stored at -70°C. The frozen buffy coat samples 
(separated from 5 ml of blood) were later shipped to the U.S, on dry ice for DNA 
extraction at the American Type Culture Collection (Manassas, Virginia) with 
20 standard protocols. DNA purity, yield, and length were satisfactory and there was 
no evidence of DNA degradation or RNA contamination. Following DNA 
extraction, 191 cases and 304 controls had sufficient DNA for AR genotyping at the 
University of Rochester. DNA samples for cases and controls were grouped into 
pairs to minimize the effect of day-to-day laboratory variation. Laboratory personnel 
25 were blinded to the case-control status. 

Molecular analysis and assessment of the CAG and GGN repeats. As 

part of an ongoing molecular analysis of the AR gene, genomic DNA from the 495 
subjects was used to determine the usual sense codon sequence and the exact 
number of CAG and GGN repeats in exon 1 of the AR gene through PCR 
30 amplification and DNA sequencing. For the CAG repeat analysis, a set of 

oligonucleotide primers that flank the CAG repeat, 5-GCTCTGGGACGCAA- 
CCTCTCT-3 1 (SEQ ID NO:7) and 5 , -GCAGCGACTACCGCATCATCA-3 , (SEQ 
ID NO:8), were designed for PCR amplification. A pair of nested primers, 5- 
CGGG-GTAAGGGAAGTAGGTGGAAG-3 1 (SEQ ID NO:15), and 5 - 
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5 CTCTACGATGGGCTTGGGGAGAAC-3 f (SEQ ID NO:16) was selected for DNA 
sequencing. For GGN analysis, the oligonucleotide primers 5- 
ACCCTCAGCCGCCGCTTCCTCATC-3' (SEQ ID NO:ll)and 5 r - 
CTGGGATAGGGCACTCTGCTCAAC-3' (SEQ ID NO:12) were used for both 
PCR amplification and sequencing. The PCR products of the GAG and GGN 

10 repeats were amplified, using the Advantage 2 Polymerase System (Clontech) and 
the Advantage-GC Genomic Polymerase System (Clontech), respectively. 
Subsequently, these PCR products were purified, using the PCR Product Purification 
Kit (Qiagen), and sequenced directly, using the Big Dye Terminator Cycle 
Sequencing Ready Reaction Kit (PE Biosystems). All reactions were optimized to 

15 ieach consistent results, using genomic DNA samples extracted from cell lines. For 
the polyglutamine tract ((CAG) tt CAA), the number of CAG triplets was counted to 
yield the length of CAG repeats. For the polyglycine tract (GGT 3 GGGiGGT 2 GGC n ) 
(SEQ ID NO: 17), the usual sense codon sequence of the GGN tract is: three GGT, 
one GGG, two GGT, followed by a variable number of GGC repeats. For example, 

20 a GGN repeat length of 23 in our study corresponded to a PCR fragment of 217 bp, 
encompassing 3 GGT, 1 GGG, 2 GGT, and 17 GGC triplets. 

Because the PCR procedure is prone to contamination, a negative control 
(water blank) was always included in each batch of PCR reactions (usually 9-18 
samples plus one negative control). The assay for one batch (9 samples) was 

25 repeated with new reagents because of an indication of minor contamination. 

Because exon 1 of the AR gene is GC-rich with CAG and GGN repeats, this region 
is difficult to amplify. Several samples had to be amplified and sequenced more 
than once. Overall, five (1%) of the 495 samples could not be typed for CAG 
repeats due to insufficient DNA or sequencing problems, while 14 (2.8%) could not 

30 be typed for GGN repeats for similar reasons. The percentages of samples that were 
unsuccessfully genotyped were similar in cases and controls. 

Twenty-four split samples from the same individual were included as quality 
control samples to assess the reproducibility of genotyping. Of the 24 quality 
control samples, 23 and 20 were amplified and sequenced successfully for the CAG 
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5 and GGN repeats, respectively. Of the 23 samples with CAG results, 21 (91%) had 
the same repeat length of 23, one had 24, and one had 22. Of the 20 samples with 
GGN results, 19 (95%) had the same repeat length of 23 and the other had length 24, 

Statistical Analysis. The mean numbers of CAG and GGN repeats were 
compared in cases and controls using the t test. Unconditional logistic regression 

10 models were used to estimate odds ratios (ORs) and their corresponding 95 percent 
confidence intervals (CIs) for prostate cancer in relation to CAG and GGN repeat 
lengths (29AR). Repeat lengths were examined first as continuous variables and 
later as categorical variables. The distributions of the number of CAG or GGN 
repeats among controls were used to derive the median or tertile cutoffs used to 

15 calculate ORs. In addition, the combined effects of CAG and GGN were evaluated 
based on the median lengths within the controls. The relationships between age, 
CAG and GGN repeat length, and other variables were assessed by Spearman 
correlation and analysis of variance. 

b) Results 

20 Selected characteristics of cases and controls are shown in Table 4. 

Compared to controls, cases had higher caloric intake and higher levels of education 
and waist-to-hip ratio and were less likely to use cigarettes or alcohol. Age at 
diagnosis ranged from 50 to 94 (median 73) for cancer cases. Sixty-nine cases 
(36%) were diagnosed as having localized cancer, and most tumors (72%) were 

25 moderately or poorly differentiated. 

Table 4 Selected characteristics of prostate cancer cases and population 
controls, China 



Cases (n=191) Controls fn=304) 



Characteristics 


Mean 


(S.D.) 


Mean 


(SD.) 


Age(yrs) 


72.2 


(7.7) 


71.9 


(7.3) 


Total calories (Kcal/day) 


2457.0 


(647) 


2342.0 


(731) 


Height (cm) 


167.9 


(6.0) 


167.6 


(5.8) 


Weight (Kg) 


61.3 


(8.4) 


61.5 


(10.1) 


BMI (Kg/m2) 


21.8 


(2.9) 


21.9 


(3-3) 
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Waist circumference (cm) 


82.6 


(10.4) 


82.5 


(10.7) 


Hid circumference (cm) 


90.7 


(8.9) 


92.6 


(8.5) 


Waist-to-hip ratio 


n 01 


\\J.VjJ ) 




ft) 05^ 


% married 


RQ ^ 








yo witn eaucanon greater 










than high school 










% smokers 


56.5 




65.1 




% alcohol users 


31.4 




42.1 




Clinical stage (%) 










Localized 


36.3 








Regional 


30.5 








Remote 


32.1 








Histologic grade (%) 










Well differentiated 


7.9 








Moderately differentiated 


31.0 








Poorly differentiated 


41.0 








Cannot be assessed 


20.1 









5 

Because the AR gene is located on the X chromosome, only one copy of the 
gene is present in men. For the polyglutamine tract (CAG U CAA), there was no 
variation in the CAA sequence among the 490 samples analyzed. The number of 
CAG repeats ranged from 10 to 34. About 65% of the study subjects had a CAG 
10 repeat length that ranged from 21 to 24, but only 1% of the subjects had a CAG 
length longer than 30 repeats (Table 5). 

Table 5 Distribution of number of CAG repeats in the androgen receptor gene in 
prostate cancer cases and controls, China 

Cases (n=190) Controls (n=300) 

15 _ 

No. of CAG repeats N % N % 

10 0 0.0 1 0.3 

14 0 0.0 1 0.3 



» 
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15 


4 


2.1 


2 


0.7 


16 


3 


1.6 


2 


0.7 


17 


1 


0.5 


3 


1.0 


18 


7 


3.7 


9 


3.0 


19 


7 


3.7 


12 


4.0 


20 


13 


6.8 


17 


5.7 


21 


24 


12.6 


32 


10.7 


22 


57 


30.0 


67 


22.3 


23 


22 


11.6 


46 


15.3 


24 


21 


11.1 


48 


16.0 


25 


16 


8.4 


19 


6.3 


26 


9 


4.7 


17 


5.7 


27 


3 


1.6 


15 


5.0 


28 


2 


1.1 


2 


0.7 


29 


0 


0.0 


3 


1.0 


30 


0 


0.0 


1 


0.3 


31 


0 


0.0 


2 


0.7 


dZ 


u 


ft ft 


0 


00 


33 


0 


0.0 


1 


0.3 


34 


1 


0.5 


0 


0.0 




22 




23 





Median 



Although the median number of CAG repeats in controls was only slightly 
larger than that in cases (23.0 vs. 22.0), there was a shift toward longer repeat length 
among controls (Figure 1). For CAG repeat length shorter than 23, cases had higher 
percentages than controls in 6 of the 10 categories. However, for CAG repeat length 
10 longer than 22, controls had higher percentage than cases in 8 of the 12 categories. 
Age at diagnosis and stage of cancer were not related to CAG repeat length, with 
similar distribution and average number of CAG and GGN repeat lengths in various 
age categories and three clinical stages. 

For the polyglycine tract (GGT 3 GGG x GGT 2 GGC n ) (SEQ ID NO:17), there 
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5 was no variation in the codon usage or the number of GGT and GGG trinucleotides 
in all of the 481 samples analyzed, although the number of GGC repeats was highly 
variable. The pattern was always three GGT, one GGG, two GGT, followed by a 
variable number of GGC. The number of GGN repeats among study subjects ranged 
from 15 to 27 (the number of GGC repeats thus ranged from 9 to 21) (Table 6). 

10 About 72% of the study subjects had a GGN repeat length of 23. 

Table 6 Distribution of number of GGN repeats in the androgen receptor gene in 

prostate cancer cases and controls, China 

Cases (n=187) Controls (n=295) 

No. of GGN repeats N % M % 

14 0 0.0 1 0.3 

15 1 0.5 1 0.3 

16 2 1.1 1 0.3 

17 2 1.1 0 0.0 

18 0 0.0 1 0.3 

19 19 10.2 20 6.8 

20 2 1.1 2 0.7 

21 3 1.6 0 0.0 

22 10 5.3 24 8.2 

23 136 72.7 212 72.1 

24 10 5.3 24 8.2 

25 2 1.1 1 0.3 
27 0 0.0 1 0.3 

Median 23 23 



15 Risks of prostate cancer associated with CAG and GGN repeat lengths are 

shown in Table 7. When the number of CAG repeats was included in the model as a 
continuous variable, there was a 7% increase in the risk of prostate cancer for each 
decrement in length of one CAG repeat (OR=1.07, 95% CI 1.00-1.15). The risks 
associated with decrements of three and six repeats were 1.21 (95% 1.14-1.32) and 
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1.42 (95% CI 1.22-1.61), respectively. When the median repeat length was used to 
dichotomize study subjects, men with a CAG repeat length shorter than 23 had a 
65% increased risk (OR=1.65, 95% CI L14-2.39), compared to men with a CAG 
repeat length of 23 or longer. Relative to the highest tertile of CAG repeat length 
(>24), men in the second and first tertiles (22-23 and <22) had ORs of 1.45 and 1.55, 
respectively (P^,uf0.06). 

Table 7 Odds ratios (ORsf and 95% confidence intervals (CIs) for prostate 
cancer in relation to the number of CAG and GGN repeats in the 
androgen receptor gene, China 





Cases 


Controls 




95% CI 


Number of CAG and GGN repeats 


No. 


No. 


ORa 


No. of CAG repeats 










Continuous (per decrement of one 


190 


300 


1.07 


1.00- 


CAG repeat) 








1.15 


Median 












74 


154 


1.00 




<23 


116 


146 


1.65 


1.14- 










2.39 


Tertile 










>24 


52 


108 


1.00 




22-23 


79 


113 


1.45 


0.93- 










2.25 


<22 


59 


79 


1.55 


0.96- 










2.49 








Linear trend 


p=0.06 


No. of GGN repeats 










Continuous (per decrement of one 


187 


294 


1.07 


0.96- 


GGN repeat) 








1.20 


Median 










>23 


147 


239 


1.00 


1.00 


<23 


39 


56 


1.12 


0.71- 










1.78 


Combined number of CAG and GGN 










repeats 










CAG>23, GGN >23 


53 


120 


1.00 
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CAG>23,GGN<23 19 29 1.48 

CAG<23,GGN >23 94 115 1.85 

CAGk23,GGN<23 20 26 1.75 

5 a Adjusted for age (continuous). 

Similarly, men with a shorter GGN repeat length had a higher risk of prostate 
cancer. Each decrement of one GGN repeat length was associated with a 7% 
increase in risk (OR= 1.07, 95% CI 0.96-1.20). Men with a GGN repeat length 
shorter than the median length of 23 had a 12% increase in prostate cancer risk, 
10 compared to those with 23 or more repeats. Because more than 72% of the subjects 
had 23 GGN repeats, the ORs by tertiles for GGN repeats were not estimated. 

Also shown in Table 7 are the ORs associated with combined categories of 
CAG and GGN repeat lengths. Men with both CAG and GGN repeat lengths shorter 
than 23 had a 75% elevated risk of prostate cancer. There was little correlation 
15 between the number of CAG and GGN repeats (r=-0.03, p>0.05). 

The number of CAG or GGN repeats did not correlate with age, education, 
body mass index, waist-to-hip ratio, total calories, smoking, or drinking. These 
variables therefore were not included in the model for adjustment. The ORs were 
materially unchanged after further adjustment for benign prostatic hyperplasia 
20 (BPH), although the cases had a higher prevalence of BPH (57% vs. 23%) and there 
was a non-significant moderate association between CAG or GGN repeat lengths 
and BPH (data are reported separately). Associations of CAG or GGN repeat length 
were similar across all stages of disease at diagnosis (data not shown). 

These results confirm that a shorter CAG repeat length is associated with an 
25 increased risk of clinically significant prostate cancer. A shorter length of GGN 
repeat also appears to increase the risk of prostate cancer, but the magnitude of 
excess risk was smaller. 

The observed inverse association with AR polymorphisms is biologically 



0.76- 

2.88 

1.21- 

2.82 

0.90- 

3.41 



WO 02/10452 



PCT/US01/23834 



49 

5 plausible, as laboratory studies have shown that a long polyglutamine chain (>30 
repeats) in the AR gene is associated with androgen insensitivity and reduced AR 
transact! vation activity (13, WAR). In vitro transfection studies also have 
demonstrated that elimination of the polyglutamine tracts results in elevated 
transcriptional activities (9,11,16AR). Clinical studies have suggested that alteration 

10 in the AR function, either through polymorphisms of CAG repeat length or somatic 
mutations, may be associated with tumor progression. For example, the progression 
from latent to clinically invasive prostate cancer is initially androgen-dependent, 
although some tumors later become androgen-independent (thus becoming non- 
responsive to hormonal treatment). Several non-germline-related changes of the AR 

15 gene, including amplification of the AR gene (usually a key step in the transition 
from a hormone-sensitive to a hormone-refractory state in prostate tumors) 
(31,32AR), AR somatic mutations (identified throughout transactivation, DNA 
binding, and ligand binding domains) (33,34AR), and contraction of CAG repeat 
length in cancer cells (32AR), have been shown to be associated with tumor 

20 aggressiveness, cancer progression, and failure of hormonal therapy. AR expression 
studies in the majority of prostate tumors, including those that have become 
refractory to hormonal therapy, also suggest that AR plays a key role in androgen- 
independent tumors (35,36AR). 

The inverse relationship between CAG repeat length and AR transcriptional 
25 activity (thus androgen sensitivity) is the currently recognized underlying molecular 
mechanism by which AR polymorphisms modulate prostate cancer risk. Because 
transcriptional activation of the AR gene is influenced by not only polymorphisms in 
the AR gene but also a number of other factors, including tissue levels of 
dihydrotestosterone (DHT), estradiol, insulin-like growth factors, and AR 
30 coactivators (37-43AR), it is likely that these factors may also affect prostate cancer 
risk by mediating transcriptional activities. Several AR coactivators, including AR- 
associated proteins (ARA70, ARA55), AIB1 (Amplified in Breast Cancer -1), CBP 
(cyclic AMP responsive element binding protein), Rb, and BRCA1, have been 
shown to enhance AR- mediated transcriptional activity from 2- to 10-fold, 
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5 suggesting that in vivo coactivators are essential in attaining optimal AR 
transactivation in response to androgens (40-43AR). 

It has been suggested that variations in CAG repeat length in the AR gene 
between populations may explain part of the large racial difference in prostate cancer 
risk and that a shorter CAG repeat length reported for African Americans may 

10 contribute to some of their higher risk of prostate cancer, although currently no data 
are available from this population. Our results confirm that, relative to western men, 
Chinese men do indeed have a longer CAG repeat length. For example, 22% of the 
1,722 white men in two U.S. studies (17,18AR) had a CAG repeat length shorter 
than 20 vs. only 10% in our study and 55% reported for African Americans in a 

15 cross-sectional survey (26,27 AR). Inverse associations have also been reported for 
Caucasians, suggesting that the underlying biological mechanism in various racial 
groups may be similar and that the polymorphisms of AR may be related, in part, to 
racial difference in prostate cancer risk. 

The common polymorphism of the AR gene confers variable risk upon all 
20 individuals, which in turn may result in a much larger proportion of prostate cancer 
cases attributable to having fewer CAG repeats. Assuming that the CAG 
polymorphism association is causal, it is estimated that 25% (95% CI 9% to 41%) of 
the cases in Shanghai can be attributed to a CAG repeat length shorter than 23. 
Using the CAG repeat length distribution in the two U.S. studies among white men 
25 (17,1 8AR), it was estimated that 3-7%% of cases in the U.S. white men can be 

attributed to the CAG polymorphism (repeat length <23) and that this polymorphism 
alone potentially accounts for at least 5% of the difference in incidence between 
Chinese and U.S. men. 

Similar to two previous studies (17,18AR), it was found that the number of 
30 GGN repeats clusters around 23 (in the study of Stanford et al., only the number of 
GGC repeats was counted and 15 was the peak of the repeat, which corresponds to 
21 GGN repeats), and that a shorter GGN repeat length appears to be associated with 
a moderate increase in prostate cancer risk. Twenty-three GGN repeats may 
represent the coding sequence for optimal AR protein conformation and activity, 
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5 because over 70% of the study subjects in our study as well as in studies of western 
men had a GGN repeat length of 23. 

Although it is well established that (GGC) n repeats in the polyglycine tract 
(GGT3GGG1GGT2GGC1O (SEQ ID NO:17) of the AR gene is polymorphic, to date 
there has been little information on variations in the GGG and GGT regions of the 

10 polyglycine tract, because these regions are GC-rich and technically it has been 

difficult to amplify these regions. Our study represents the first successful effort to 
sequence the exact codon usage and number of the GGN trinucleotide repeats in a 
large number of population-based samples. It was showed that GGT and GGG 
regions were quite stable and there were no variations in these two regions in all of 

15 the 48 1 DNA samples analyzed. 
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K Sequences 

45 SEQ ID NO:l first primer 

5'-TCA TCA CCT CCG ACA ACA GAG G-3 1 



SEQ ID NO:2 second primer 

5'-TAT GGA AAC TGT TGC GGA GGA G-3' 
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5 

SEQ ID NO:3 complement to first primer 
5'-CCT CTG TTG TCG GAG GTG ATG A-3' 

SEQ ID NO:4 complement to second primer 
10 5'-CTC CTC CGC AAC AGT TTC CAT A -3' 

SEQ ID NO:5 general CAG/CAA sequence 

5'-(CAG)n CAA (CAG) n (CAACAG)* CAG CAA CAG CAG CAA-3' 
Nucleotides 1-3 and 6-9 as a triplets must be at least one triple but can be any 
15 multiple of triplets. 

SEQ ID NO:6 one particular 29 mer sequence 

5'-CAG CAG CAG CAG CAG CAG CAA CAG CAG CAG CAG CAG CAG CAG 
CAG CAG CAA CAG CAA CAG CAA CAG CAA CAG CAG CAA CAG CAG 
20 CAA-3' 

SEQ ID NO:7 first androgen receptor CAG primer 
5'-GCT CTG GGA CGC AACCTCTCT-3' 

25 SEQ ID NO:8 second androgen receptor CAG primer 
5'-GCA GCG ACT ACC GCA TCA TCA-3' 

SEQ ID NO:9 complement to first androgen receptor CAG primer 
5 ' - AGAGAGGTTGCGTCCC AGAGC-3 ' 

30 

SEQ ID NO: 10 complement to second androgen receptor CAG primer 
5'-TGATGATGCGGTAGTCGCTGC-3' 

SEQ ID NO: 11 first androgen receptor GGN primer 
35 5'ACCCTCAGCCGCCGCTTCCTCATC-3' , 

SEQ ID NO: 12 second androgen receptor GGN primer 
5'-CTGGGATAGGGCACTCTGCTCAAC-3' 

40 SEQ ID NO:13 complement to first androgen receptor GGN primer 
5 ' -G ATGAGG AAGCGGCGGCTGAGGGT-3 ' 

SEQ ID NO:14 complement to second androgen receptor GGN primer 
5 ' -GTTGAGCAGAGTGCCCTATCCCAG-3' 

45 

SEQIDNO:15 

5-CGGG-GTAAGGGAAGTAGGTGGAAG-3' 

SEQIDNO:16 
50 and 5'-CTCTACGATGGGCTrGGGGAGAAC-3' 
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SEQIDN0:17 
GGTsGGGiGGTzGGCh 

SEQ ID NO: 18 AIB1 sequence Genbank accession number XM_030032 
1 cggcagcggc tgcggcttag tcggtggcgg ccggcggcgg ctgcgggctg agcggcgagt 
61 ttccgattta aagctgagct gcgaggaaaa tggcggcggg aggatcaaaa tacttgctgg 
121 atggtggact cagagaccaa taaaaataaa ctgcttgaac atcctttgac tggttagcca 
181 gttgctgatg tatattcaag atgagtggat taggagaaaa cttggatcca ctggccagtg 
241 attcacgaaa acgcaaattg ccatgtgata ctccaggaca aggtcttacc tgcagtggtg 
301 aaaaacggag acgggagcag gaaagtaaat atattgaaga attggctgag ctgatatctg 
361 ccaatcttag tgatattgac aatttcaatg tcaaaccaga taaatgtgcg attttaaagg 
421 aaacagtaag acagatacgt caaataaaag agcaaggaaa aactatttcc aatgatgatg 
481 atgttcaaaa agccgatgta tcttctacag ggcagggagt tattgataaa gactccttag 
541 gaccgctttt acttcaggca ttggatggtt tcctatttgt ggtgaatcga gacggaaaca 
601 ttgtatttgt atcagaaaat gtcacacaat acctgcaata taagcaagag gacctggtta 
661 acacaagtgt ttacaatatc ttacatgaag aagacagaaa ggattttctt aagaatttac 
721 caaaatctac agttaatgga gtttcctgga caaatgagac ccaaagacaa aaaagccata 
781 catttaattg ccgtatgttg atgaaaacac cacatgatat tctggaagac ataaacgcca 
841 gtcctgaaat gcgccagaga tatgaaacaa tgcagtgctt tgccctgtct cagccacgag 
901 ctatgatgga ggaaggggaa gatttgcaat cttgtatgat ctgtgtggca cgccgcatta 
961 ctacaggaga aagaacattt ccatcaaacc ctgagagctt tattaccaga catgatcttt 
1021 caggaaaggt tgtcaatata gatacaaatt cactgagatc ctccatgagg cctggctttg 
1081 aagatataat ccgaaggtgt attcagagat tttttagtct aaatgatggg cagtcatggt 
1141 cccagaaacg tcactatcaa gaagcttatc ttaatggcca tgcagaaacc ccagtatatc 
1201 gattctcgtt ggctgatgga actatagtga ctgcacagac aaaaagcaaa ctcttccgaa 
1261 atcctgtaac aaatgatcga catggctttg tctcaaccca cttccttcag agagaacaga 
1321 atggatatag accaaaccca aatcctgttg gacaagggat tagaccacct atggctggat 
1381 gcaacagttc ggtaggcggc atgagtatgt cgccaaacca aggcttacag atgccgagca 
1441 gcagggccta tggcttggca gaccctagca ccacagggca gatgagtgga gctaggtatg 
1501 ggggttccag taacatagct tcattgaccc ctgggccagg catgcaatca ccatcttcct 
1561 accagaacaa caactatggg ctcaacatga gtagcccccc acatgggagt cctggtcttg 
1621 ccccaaacca gcagaatatc atgatttctc ctcgtaatcg tgggagtcca aagatagcct 
1681 cacatcagtt ttctcctgtt gcaggtgtgc actctcccat ggcatcttct ggcaatactg 
1741 ggaaccacag cttttccagc agctctctca gtgccctgca agccatcagt gaaggtgtgg 
1801 ggacttccct tttatctact ctgtcatcac caggccccaa attggataac tctcccaata 
1861 tgaatattac ccaaccaagt aaagtaagca atcaggattc caagagtcct ctgggctttt 
1921 attgcgacca aaatccagtg gagagttcaa tgtgtcagtc aaatagcaga gatcacctca 
1981 gtgacaaaga aagtaaggag agcagtgttg agggggcaga gaatcaaagg ggtcctttgg 
2041 aaagcaaagg tcataaaaaa ttactgcagt tacttacctg ttcttctgat gaccggggtc 
2101 attcctcctt gaccaactcc cccctagatt caagttgtaa agaatcttct gttagtgtca 
2161 ccagcccctc tggagtctcc tcctctacat ctggaggagt atcctctaca tccaatatgc 
2221 atgggtcact gttacaagag aagcaccgga ttttgcacaa gttgctgcag aatgggaatt 
2281 caccagctga ggtagccaag attactgcag aagccactgg gaaagacacc agcagtataa 
2341 cttcttgtgg ggacggaaat gttgtcaagc aggagcagct aagtcctaag aagaaggaga 
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5 2401 ataatgcact tettagatac ctgctggaca gggatgatcc tagtgatgca ctctctaaag 

2461 aactacagcc ccaagtggaa ggagtggata ataaaatgag tcagtgcacc agctccacca 
2521 ttcctagctc aagtcaagag aaagacccta aaattaagac agagacaagt gaagagggat 
2581 ctggagactt ggataatcta gatgctattc ttggtgatct gactagttct gacttttaca 
2641 ataattccat atcctcaaat ggtagtcatc tggggactaa gcaacaggtg tttcaaggaa 

10 2701 ctaattctct gggtttgaaa agttcacagt ctgtgcagtc tattcgtcct ccatataacc 
2761 gagcagtgtc tctggatagc cctgtttctg ttggctcaag tcctccagta aaaaatatca 
2821 gtgctttccc catgttacca aagcaaccca tgttgggtgg gaatccaaga atgatggata 
2881 gtcaggaaaa ttatggctca agtatgggtg ggccaaaccg aaatgtgact gtgactcaga 
2941 ctccttcctc aggagactgg ggcttaccaa acteaaaggc cggcagaatg gaacctatga 

15 3001 attcaaactc catgggaaga ccaggaggag attataatac ttctttaccc agacctgcac 
3061 tgggtggctc tattcccaca ttgcctcttc ggtctaatag cataccaggt gcgagaccag 
3121 tattgcaaca gcagcagcag atgcttcaaa tgaggcctgg tgaaatcccc atgggaatgg 
3181 gggctaatcc ctatggccaa gcagcagcat ctaaccaact gggttcctgg cccgatggca 
3241 tgttgtccat ggaacaagtt tctcatggca ctcaaaatag gcctcttctt aggaattccc 

20 3301 tggatgatct tgttgggcca ccttccaacc tggaaggcca gagtgacgaa agagcattat 
3361 tggaccagct gcacactctt ctcagcaaca cagatgccac aggcctggaa gaaattgaca 
3421 gagctttggg cattcctgaa cttgtcaatc agggacaggc attagagccc aaacaggatg 
3481 ctttccaagg ccaagaagca gcagtaatga tggatcagaa ggcaggatta tatggacaga 
3541 catacccagc acaggggcct ccaatgcaag gaggctttca tcttcaggga caatcaccat 

25 3601 cttttaactc tatgatgaat cagatgaacc agcaaggcaa ttttcctctc caaggaatgc 

3661 acccacgagc caacatcatg agaccccgga caaacacccc caagcaactt agaatgcagc 
3721 ttcagcagag gctgcagggc cagcagtttt tgaatcagag ccgacaggca cttgaattga 
3781 aaatggaaaa ccctactgct ggtggtgctg cggtgatgag gcctatgatg cagccccagc 
3841 agggttttct taatgctcaa atggtcgccc aacgcagcag agagctgcta agtcatcact 

30 3901 tccgacaaca gagggtggct atgatgatgc agcagcagca gcagcagcaa cagcagcagc 
3961 agcagcagca gcagcagcaa cagcaacagc aacagcaaca gcagcaacag cagcaaaccc 
4021 aggccttcag cccacctcct aatgtgactg cttcccccag catggatggg cttttggcag 
4081 gacccacaat gccacaagct cctccgcaac agtttccata tcaaccaaat tatggaatgg 
4141 gacaacaacc agatccagcc tttggtcgag tgtctagtcc tcccaatgca atgatgtcgt 

35 4201 caagaatggg tccctcccag aatcccatga tgcaacaccc gcaggctgca tccatctatc 
4261 agtcctcaga aatgaagggc tggccatcag gaaatttggc caggaacagc tccttttccc 
4321 agcagcagtt tgcccaccag gggaatcctg cagtgtatag tatggtgcac atgaatggca 
4381 gcagtggtcacatgggacag atgaacatga accccatgcc catgtctggc atgcctatgg 
4441 gtcctgatca gaatactgct gacatctctg caccaggacc tcttaaggaa accactgtac 

40 4501 aaatgacact gcactaggat tattgggaag gaatcattgt tccaggcatc catcttggaa 
4561 gaaaggacca gctttgagct ccatcaaggg tattttaagt gatgtcattt gagcaggact 
4621 ggattttaag ccgaagggca atatctacgt gtttttcccc cctccttctg ctgtgtatca 
4681 tggtgttcaa aacagaaatg ttttttggca ttccacctcc tagggatata attctggaga 
4741 catggagtgt tactgatcat aaaacttttg tgtcactttt ttctgccttg ctagccaaaa 

45 4801 tctcttaaat acacgtaggt gggccagaga acattggaag aatcaagaga gattagaata 
4861 tctggtttct ctagttgcag tattggacaa agagcatagt cccagccttc aggtgtagta 
4921 gttctgtgtt gaccctttgt ccagtggaat tggtgattct gaattgtcct ttactaatgg 
4981 tgttgagttg ctctgtccct attatttgcc ctaggctttc tcctaatgaa ggttttcatt 
5041 tgccattcat gtcctgtaat acttcacctc caggaactgt catggatgtc caaatggctt 
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5 5101 tgcagaaagg aaatgagatg acagtattta atcgcagcag tagcaaactt ttcacatgct 
5161 aatgtgcagc tgagtgcact ttatttaaaa agaatggata aatgcaatat tcttgaggtc 
5221 ttgagggaat agtgaaacac attcctggtt tttgcctaca cttacgtgtt agacaagaac 
5281 tatgattttt ttttttaaag tactggtgtc accctttgcc tatatggtag agcaataatg 
5341 ctttttaaaa ataaacttct gaaaacccaa ggccaggtac tgcattctga atcagaatct 

10 5401 cgcagtgttt ctgtgaatag atttttttgt aaatatgacc tttaagatat tgtattatgt 
5461 aaaatatgtatatacctttt tttgtaggtc acaacaactc atttttacag agtttgtgaa 
5521 gctaaatatt taacattgtt gatttcagta agctgtgtgg tgaggctacc agtggaagag 
' 5581 acatcccttg acttttgtgg cctgggggag gggtagtgct ccacagcttt tccttcccca 
5641 ccccccagcc ttagatgcct cgctcttttc aatctcttaa tctaaatgct ttttaaagag 

15 5701 attatttgtt tagatgtagg cattttaatt ttttaaaaat tcctctacca gaactaagca 

5761 ctttgttaat ttggggggaa agaatagata tggggaaata aacttaaaaa aaaatcagga 
5821 atttaaaaaa acgagcaatt tgaagagaat cttttggatt ttaagcagtc cgaaataata 
5881 gcaattcatg ggctgtgtgt gtgtgtgtat gtgtgtgtgt gtgtgtgtat gtttaattat 
5941 gttacctttt catccccttt aggagcgttt tcagattttg gttgctaaga cctgaatccc 

20 6001 atattgagat ctcgagtaga atccttggtg tggtttctgg tgtctgctca gctgtcccct 
6061 cattctacta atgtgatgct ttcattatgt ccctgtggat tagaatagtg tcagttattt 
6121 cttaagtaac tcagtaccca gaacagccag ttttactgtg attcagagcc acagtctaac 
6181 tgagcacctt ttaaacccct ccctcttctg ccccctacca cttttctgct gttgcctctc 
6241 tttgacacct gttttagtca gttgggagga agggaaaaat caagtttaat tccctttatc 

25 6301 tgggttaatt catttggttc aaatagttga cggaattggg tttctgaatg tctgtgaatt 
6361 tcagaggtct ctgctagcct tggtatcatt ttctagcaat aactgagagc cagttaattt 
6421 taagaatttc acacatttag ccaatctttc tagatgtctc tgaaggtaag atcatttaat 
6481 atctttgata tgcttacgag taagtgaatc ctgattattt ccagacccac caccagagtg 
6541 gatcttattt tcaaagcagt atagacaatt atgagtttgc cctctttccc ctaccaagtt 

30 6601 caaaatatat ctaagaaaga ttgtaaatcc gaaaacttcc attgtagtgg cctgtgcttt 
6661 tcagatagta tactctcctg tttggagaca gaggaagaac caggtcagtc tgtctctttt 
6721 tcagctcaat tgtatctgac ccttctttaa gttatgtgtg tggggagaaa tagaatggtg 
6781 ctcttatctt tcttgacttt aaaaaaatta ttaaaaacaa aaaaaaaata aa 



35 SEQIDNO:19 

MSGLGENIJ>PIJ^SDSRERKLPCDTPGQGLT(^GEKRRREQES^ 

BEELAELISANI^DrDNFNVKPDKCAILKETW 

TGQGVIDKDSLGPIJXQALDGFLJF^^ 

LHEEDRKDHJKNUPKS^ 
40 QRYFIMQCFALSQPRAMMEEGEDLQSCNflCVAKRIl^ 

VVNIDTNSUISSMRPGFEDIIRRCIQRFFSI^GQS 

FSIJU^GTIVTAQTKSKIiTRNPVTNDRH 

GCNSSVGGMSMSPNQGIXJMPSSRAYGIADPSTTGQMSGARYGGSSNIASLTPG^ 

PSSYQNNNYGIJNMSSPPHGSPGI^^ 
45 SSGOTGNHSFSSSSI^ALQAISEGVGTS^ 

SKSPLGFYO)QNPVESSMC(^NSRDHI^DKESKESS 

LTCSSDDRGHSSLTNSPUDSSC^ 

RIIjmXQNGNSPAEVAKITAEATC 

I1I)W)DPSDAI^KELQP 
50 NIJDAILGDLTSSDF^ 

SUDSPVSVGSSPPVKMSAFPMLPKQPMLGGNPRMMDSQEOT 

PSSGDWGIPNSKAGRMEPMNSNSMGRPGGm^ 

PVLQQQQQMLQMRPGEIPMGMGANPYGQA^ 
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5 RNSII)DLVGPPSNI£GQSDERAII^ 

EPKQDAFQGQEAAVMMDQKAGLYGQTYPAQGPPMQGGFH^ 
NFPLQGMHPRANIMRPR^ 

VMRPMMQPQQGFLNAQMVAQRSRE1XSHHFRQQRVAMMM 
Q 

10 QQQQQQQQQQTQAFSPPPNVTASPSMDGIXAGPTMPQAPPQQEPYQPNYGMGQQPDPA 
FGRVSSPPNAMMSSRMGPSQNPMMQHPQAASrYQSSE^ 
HQGNPAVYSMVHMNGSSGHMGQMNMNPMPMSGMPM 
LH 

SEQIDNO:20 

15 1: NM.000044 

1 cgagatcccg gggagccagc ttgctgggag agcgggacgg tccggagcaa gcccacaggc 
61 agaggaggcg acagagggaa aaagggccga gctagccgct ccagtgctgt acaggagccg 
121 aagggacgca ccacgccagc cccagcccgg ctccagcgac agccaacgcc tcttgcagcg 
181 cggcggcttc gaagccgccg cccggagctg ccctttcctc ttcggtgaag tttttaaaag 

20 241 ctgctaaaga ctcggaggaa gcaaggaaag tgcctggtag gactgacggc tgcctttgtc 
301 ctcctcctct ccaccccgcc tccccccacc ctgccttccc cccctccccc gtcttctctc 
361 ccgcagctgc ctcagtcggc tactctcagc caacccccct caccaccctt ctccccaccc 
421 gcccccccgc ccccgtcggc ccagcgctgc cagcccgagt ttgcagagag gtaactccct 
481 ttggctgcga gcgggcgagc tagctgcaca ttgcaaagaa ggctcttagg agccaggcga 

25 541 ctggggagcg gcttcagcac tgcagccacg acccgcctgg ttagaattcc ggcggagaga 
601 accctctgtt ttcccccact ctctctccac ctcctcctgc cttccccacc ccgagtgcgg 
661 agcagagatc aaaagatgaa aaggcagtca ggtcttcagt agccaaaaaa caaaacaaac 
721 aaaaacaaaa aagccgaaat aaaagaaaaa gataataact cagttcttat ttgcacctac 
781 ttcagtggac actgaatttg gaaggtggag gattttgttt ttttctttta agatctgggc 

30 841 atcttttgaa tctacccttc aagtattaag agacagactg tgagcctagc agggcagatc 
901 ttgtccaccg tgtgtcttct tctgcacgag actttgaggc tgtcagagcg ctttttgcgt 
961 ggttgctccc gcaagtttcc ttctctggag cttcccgcag gtgggcagct agctgcagcg 
1021 actaccgcat catcacagcc tgttgaactc ttctgagcaa gagaagggga ggcggggtaa 
1081 gggaagtagg tggaagattc agccaagctc aaggatggaa gtgcagttag ggctgggaag 

35 1 141 ggtctaccct cggccgccgt ccaagaccta ccgaggagct ttccagaatc tgttccagag 

1201 cgtgcgcgaa gtgatccaga acccgggccc caggcaccca gaggccgcga gcgcagcacc 
1261 tcccggcgcc agtttgctgc tgctgcagca gcagcagcag cagcagcagc agcagcagca 
1321 gcagcagcag cagcagcagc agcagcaaga gactagcccc aggcagcagc agcagcagca 
1381 gggtgaggat ggttctcccc aagcccatcg tagaggcccc acaggctacc tggtcctgga 

40 1441 tgaggaacag caaccttcac agccgcagtc ggccctggag tgccaccccg agagaggttg 
1501 cgtcccagag cctggagccg ccgtggccgc cagcaagggg ctgccgcagc agctgccagc 
1561 acctccggac gaggatgact cagctgcccc atccacgttg tccctgctgg gccccacttt 
1621 ccccggctta agcagctgct ccgctgacct taaagacatc ctgagcgagg ccagcaccat 
1681 gcaactcctt cagcaacagc agcaggaagc agtatccgaa ggcagcagca gcgggagagc 

45 1741 gagggaggcc tcgggggctc ccacttcctc caaggacaat tacttagggg gcacttcgac 
1801 catttctgac aacgccaagg agttgtgtaa ggcagtgtcg gtgtccatgg gcctgggtgt 
1861 ggaggcgttg gagcatctga gtccagggga acagcttcgg ggggattgca tgtacgcccc 
1921 acttttggga gttccacccg ctgtgcgtcc cactccttgt gccccattgg ccgaatgcaa 
1981 aggttctctg ctagacgaca gcgcaggcaa gagcactgaa gatactgctg agtattcccc 

50 2041 tttcaaggga ggttacacca aagggctaga aggcgagagc ctaggctgct ctggcagcgc 
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5 2101 tgcagcaggg agctccggga cacttgaact gccgtctacc ctgtctctct acaagtccgg 
2161 agcactggac gaggcagctg cgtaccagag tcgcgactac tacaactttc cactggctct 
2221 ggccggaccg ccgccccctc cgccgcctcc ccatccccac gctcgcatca agctggagaa 
2281 cccgctggac tacggcagcg cctgggcggc tgcggcggcg cagtgccgct atggggacct 
2341 ggcgagcctg catggcgcgg gtgcagcggg acccggttct gggtcaccct cagccgccgc 

10 2401 ttcctcatcc tggcacactc tcttcacagc cgaagaaggc cagttgtatg gaccgtgtgg 

2461 tggtggtggg ggtggtggcg gcggcggcgg cggcggcggc ggcggcggcg gcggcggcgg 
2521 cggcggcggc gaggcgggag ctgtagcccc ctacggctac actcggcccc ctcaggggct 
2581 ggcgggccag gaaagcgact tcaccgcacc tgatgtgtgg taccctggcg gcatggtgag 
2641 cagagtgccc tatcccagtc ccacttgtgt caaaagcgaa atgggcccct ggatggatag 

15 2701 ctactccgga ccttacgggg acatgcgttt ggagactgcc agggaccatg ttttgcccat 
2761 tgactattac tttccacccc agaagacctg cctgatctgt ggagatgaag cttctgggtg 
2821 tcactatgga gctctcacat gtggaagctg caaggtcttc ttcaaaagag ccgctgaagg 
2881 gaaacagaag tacctgtgcg ccagcagaaa tgattgcact attgataaat tccgaaggaa 
2941 aaattgtcca tcttgtcgtc ttcggaaatg ttatgaagca gggatgactc tgggagcccg 

20 3001 gaagctgaag aaacttggta atctgaaact acaggaggaa ggagaggctt ccagcaccac 
3061 cagccccact gaggagacaa cccagaagct gacagtgtca cacattgaag gctatgaatg 
3121 tcagcccatc tttctgaatg tcctggaagc cattgagcca ggtgtagtgt gtgctggaca 
3181 cgacaacaac cagcccgact cctttgcagc cttgctctct agcctcaatg aactgggaga 
3241 gagacagctt gtacacgtgg tcaagtgggc caaggccttg cctggcttcc gcaacttaca 

25 3301 cgtggacgac cagatggctg tcattcagta ctcctggatg gggctcatgg tgtttgccat 
3361 gggctggcga tccttcacca atgtcaactc caggatgctc tacttcgccc ctgatctggt 
3421 tttcaatgag taccgcatgc acaagtcccg gatgtacagc cagtgtgtcc gaatgaggca 
3481 cctctctcaa gagtttggat ggctccaaat caccccccag gaattcctgt gcatgaaagc 
3541 actgctactc ttcagcatta ttccagtgga tgggctgaaa aatcaaaaat tctttgatga 

30 3601 acttcgaatg aactacatca aggaactcga tcgtatcatt gcatgcaaaa gaaaaaatcc 
3661 cacatcctgc tcaagacgct tctaccagct caccaagctc ctggactccg tgcagcctat 
3721 tgcgagagag ctgcatcagt tcacttttga cctgctaatc aagtcacaca tggtgagcgt 
3781 ggactttccg gaaatgatgg cagagatcat ctctgtgcaa gtgcccaaga tcctttctgg 
3841 gaaagtcaag cccatctatt tccacaccca gtgaagcatt ggaaacccta tttccccacc 

35 3901 ccagctcatg ccccctttca gatgtcttct gcctgttata actctgcact actcctctgc 
3961 agtgccttgg ggaatttcct ctattgatgt acagtctgtc atgaacatgt tcctgaattc 
4021 tatttgctgg gctttttttt tctctttctc tcctttcttt ttcttcttcc ctccctatct 
4081 aaccctccca tggcaccttc agactttgct tcccattgtg gctcctatct gtgttttgaa 
4141 tggtgttgta tgcctttaaa tctgtgatga tcctcatatg gcccagtgtc aagttgtgct 

40 4201 tgtttacagq actactctgt gccagccaca caaacgttta cttatcttat gccacgggaa 

4261 gtttagagag ctaagattat ctggggaaat caaaacaaaa aacaagcaaa caaaaaaaaa 
4321a 

SEQDDNO:21 

1:NM_000044 

45 MEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEA 

ASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQQETSPRQQQQQQGEDGSPQAHRRGP 

TGYLVLDEEQQPSQPQSALECHPERGCVPEPGAAVAASKGLPQQLPAPPDEDDSAAPS 

TLSIiGPTFPCaJSSCSADLKDILSEASTMQlXQQ 

SKDNYLGGTSTISDNAKELCK^^ 

50 VRPTPCAPIJVECKGSIIX>DSAGKSTEDTAEYSPFKGGYTK 
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5 GTLELPSTI^LYKSGALDEAAAYQSRDYYNI^ 

YGSAWAAAAAQCRYGDLASIJIGAGAAGPGSGSPSAAASSSWH^ 

GGGGGGGGGGGGGGGGGGGGGGEAGAVAPYGYTEPPQGIAGQ^ 

SRWYPSPTCVKSEMGPWMDSYSGPY 

SGCHYGALTCGSCKVFFKRAAE^ 
10 TLGARKLKKLGNLK1XJEEGEASS 

GWCAGHDNNQPDSFAAIi^SLNELGERQLVHVVKW 

WMGLMVFAMGWRSFT^ 

ITPQEFLCMKAIJJLFSIIPVDGIJ^ 

YQLTKUX>SVQPIAimLH^^ 
15 YFHTQ 
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5 What is claimed is: 

1. A method for assessing the risk of prostate cancer in a human subject 
comprising determining the length of the contiguous CAG or CAA repeats in both 
AIB1 gene alleles of the subject and assessing whether the length of the CAG or 
CAA repeats is less than, equal to, or greater than 29 repeats, a length less than or 

10 greater than 29 repeats in both alleles indicating an increased risk of prostate cancer 
in the subject. 

2. The method of claim 1, wherein determining the length of the repeats 
comprises amplifying a region of both AIB1 gene alleles comprising the contiguous 
CAG or CAA repeat 

15 3. The method of claim 2, wherein the amplification is by PCR that 

produces two PCR products. 

4. The method of claim 3, further comprising analyzing the PCR products by 
chromatography. 

5. The method of claim 4, wherein the chromatography is gel 
20 electrophoresis. 

6. The method of claim 2, wherein the sequence of the PCR products is 
determined. 

7. The method of claim 3, wherein the PCR product is produced using a first 
AIB1 primer that selectively hybridizes with the sequence set forth in SEQ ID NO:3 

25 and a second AIB 1 primer that selectively hybridizes with the sequence set forth in 
SEQIDNO:4. 

8. The method of claim 7, wherein the first AIB1 primer has the sequence 
set forth in SEQ ID NO:l and the second AIB1 primer has the sequence set forth in 
SEQIDNO:2. 

30 9. The method of claim 1, wherein determining how many CAG or CAA 

repeats there are comprises sequencing the CAG or CAA repeats. 
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5 10. A method for assessing the risk of prostate cancer in a human subject 

comprising determining the length of the contiguous CAG or CAA repeats in the 
AIM gene alleles of the subject and assessing whether the length of the CAG or 
CAA repeats in each allele is less than, equal to, or greater than 29 repeats, and 
determining the length of the contiguous CAG or CAA repeats in the androgen 

10 receptor gene of the subject and assessing whether the length of the CAG or CAA 
repeats is less than, equal to, or greater than 23 repeats, a length in at least one allele 
less than or greater than 29 repeats in the AIB1 gene and less than or greater than 23 
repeats in the androgen receptor gene indicating an increased risk of prostate cancer 
in the subject. 

15 11. The method of claim 10, wherein determining the length of the 

contiguous CAG or CAA repeats in the AIB1 gene alleles comprises amplifying a 
region of the AIM gene alleles comprising the CAG or CAA repeats and wherein 
determining the length of the CAG or CAA repeats in the androgen receptor gene 
comprises amplifying a region of the androgen receptor gene comprising the CAG or 

20 CAA repeats. 

12. The method of claim 1 1 , wherein the amplification of the regions of the 
AIB1 gene and the androgen receptor gene is by PGR that produces a first and a 
second AIM PCR product and an androgen receptor PCR product. 

13. The method of claim 12, further comprising analyzing the PCR products 
25 by chromatography. 

14. The method of claim 13, wherein the chromatography is gel 
electrophoresis. 

15. The method of claim 12, wherein the sequence of the PCR products is 
determined. 

30 16. The method of claim 12, wherein the AIM PCR product is produced 

using a first AIM primer that selectively hybridizes with the sequence set forth in 
SEQ ID NO:3 and a second AIM primer that selectively hybridizes with the 
sequence set forth in SEQ ID NO:4. 
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5 17. The method of claim 16, wherein the first AIB1 primer has the sequence 

set forth in SEQ ID NO:l and the second AIB1 primer has the sequence set forth in 
SEQIDNO:2. 

18. The method of claim 12, wherein the androgen receptor PCR product is 
produced using a first androgen receptor CAG primer that selectively hybridizes 

10 with the sequence set forth in SEQ ID NO:9 and a second androgen receptor CAG 
primer that selectively hybridizes with the sequence set forth in SEQ ID NO: 10. 

19. The method of claim 18, wherein the first androgen receptor CAG 
primer has the sequence set forth in SEQ ID NO:7 and the second androgen receptor 
CAG primer has the sequence set forth in SEQ ID NO:8. 

15 20. The method of claim 12, wherein the AIB1 PCR product is produced 

using a first AIB1 primer that selectively hybridizes with the sequence set forth in 
SEQ ID NO:3 and a second ADB1 primer that selectively hybridizes with the 
sequence set forth in SEQ ID NO:4. and wherein the androgen receptor PCR product 
is produced using a first androgen receptor CAG primer that selectively hybridizes 

20 with the sequence set forth in SEQ ID NO:9 and a second androgen receptor CAG 
primer that selectively hybridizes with the sequence set forth in SEQ ID NO: 10. 

21. The method of claim 20, wherein the first AIB1 primer has the sequence 
set forth in SEQ ID NO:l and the second AEB1 primer has the sequence set forth in 
SEQ ID NO:2 and wherein the first androgen receptor CAG primer has the sequence 

25 set forth in SEQ ID NO:7 and the second androgen receptor CAG primer has the 
sequence set forth in SEQ ID NO:8. 

22. The method of claim 10, wherein more than 29 contiguous CAG or 
CAA repeats in at least one allele of the AIB1 gene of the person and more than 23 
contiguous CAG or CAA repeats in the androgen receptor gene of the person 

30 indicates an increased risk of prostate cancer in the subject. 

23. The method of claim 10, wherein more than 29 contiguous CAG or 
CAA repeats in at least one allele of the AIB1 gene of the person and less than 23 
contiguous CAG or CAA repeats in the androgen receptor gene of the person 
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5 indicates an increased risk of prostate cancer in the subject. 

24. The method of claim 10, wherein less than 29 contiguous CAG or CAA 
repeats in at least one allele of the AIB1 gene of the person and more than 23 
contiguous CAG or CAA repeats in the androgen receptor gene of the person 
indicates an increased risk of prostate cancer in the subject. 

10 25. The method of claim 10, wherein less than 29 contiguous CAG or CAA 

repeats in at least one allele of the AIB1 gene of the person and less than 23 
contiguous CAG or CAA repeats in the androgen receptor gene of the person 
indicates an increased risk of prostate cancer in the subject. 

26. The method of claim 10, wherein more or less than 29 contiguous CAG 
15 or CAA repeats in both alleles of the AIB 1 gene of the person and more than 23 

contiguous CAG or CAA repeats in the androgen receptor gene of the person 
indicates an increased risk of prostate cancer in the subject. 

27. The method of claim 10, wherein more or less than 29 contiguous CAG 
or CAA repeats in both alleles of the AIB1 gene of the person and less than 23 

20 contiguous CAG or CAA repeats in the androgen receptor gene of the person 
indicates an increased risk of prostate cancer in the subject. 

28. The method of claim 10, wherein more than 29 contiguous CAG or 
CAA repeats in both alleles of the AIB1 gene of the person and more than 23 
contiguous CAG or CAA repeats in the androgen receptor gene of the person 

25 indicates an increased risk of prostate cancer in the subject. 

29. The method of claim 10, wherein less than 29 contiguous CAG or CAA 
repeats in both alleles of the AIB1 gene of the person and less than 23 contiguous 
CAG or CAA repeats in the androgen receptor gene of the person indicates an 
increased risk of prostate cancer in the subject. 

30 30. A method for assessing the risk of prostate cancer in a human subject 

comprising determining the length of the contiguous CAG or CAA repeats in the 
AIB1 gene alleles of the subject and assessing whether the length of the CAG or 
CAA repeats in each allele is less than, equal to, or greater than 29 repeats, and 
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5 determining the length of the contiguous GGN repeats in the androgen receptor gene 
of the subject and assessing whether the length of the GGN repeats is less than, 
equal to, or greater than 23 repeats, a length in at least one allele less than or greater 
than 29 repeats in the ABB1 gene and less than or greater than 23 repeats in the 
androgen receptor gene indicating an increased risk of prostate cancer in the subject, 

10 wherein N is either T, G, or C. 

31. The method of claim 30, wherein determining the length of the 
contiguous CAG or CAA repeats in the ADB1 gene alleles comprises amplifying a 
region of the AIB1 gene alleles comprising the CAG or CAA repeats and wherein 
determining the length of the contiguous GGN repeats in the androgen receptor gene 

15 comprises amplifying a region of the androgen receptor gene comprising the GGN 
repeats. 

32. The method of claim 31, wherein the amplification of the regions of the 
AIB1 gene and the androgen receptor gene is by PCR that produces a first and a 
second ABB 1 PCR product and an androgen receptor PCR product. 

20 33. The method of claim 32, further comprising analyzing the PCR products 

by chromatography. 

34. The method of claim 33, wherein the chromatography is gel 
electrophoresis. 

35. The method of claim 32, wherein the sequence of the PCR products is 
25 determined. 

36. The method of claim 32, wherein the AIB1 PCR product is produced 
using a first A1B1 primer that selectively hybridizes with the sequence set forth in 
SEQ ID NO:3 and a second ADBl primer that selectively hybridizes with the 
sequence set forth in SEQ ID NO:4. 

30 37. The method of claim 36, wherein the first AIB 1 primer has the sequence 

set forth in SEQ ID NO:l and the second AIB1 primer has the sequence set forth in 
SEQIDNO:2. 
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5 38. The method of claim 32, wherein the androgen receptor PCR product is 

produced using a first androgen receptor GGN primer that selectively hybridizes 
with the sequence set forth in SEQ ID NO: 13 and a second androgen receptor GGN 
primer that selectively hybridizes with the sequence set forth in SEQ ID NO: 14. 

39. The method of claim 38, wherein the first androgen receptor GGN 
10 primer has the sequence set forth in SEQ ID NO: 1 1 and the second androgen 

receptor GGN primer has the sequence set forth in SEQ ID NO: 12. 

40. The method of claim 32, wherein the AIB 1 PCR product is produced 
using a first ABB1 primer that selectively hybridizes with the sequence set forth in 
SEQ ID NO;3 and a second AH31 primer that selectively hybridizes with the 

15 sequence set forth in SEQ ID NO:4. and wherein the androgen receptor PCR product 
is produced using a first androgen receptor GGN primer that selectively hybridizes 
with the sequence set forth in SEQ ID NO: 13 and a second androgen receptor GGN 
primer that selectively hybridizes with the sequence set forth in SEQ ID NO: 14. 

41. The method of claim 40, wherein the first AEB1 primer has the sequence 
20 set forth in SEQ ID NO:l and the second AIB1 primer has the sequence set forth in 

SEQ ID NO:2 and wherein the first androgen receptor GGN primer has the sequence 
set forth in SEQ ID NO: 1 1 and the second androgen receptor GGN primer has the 
sequence set forth in SEQ ID NO: 12. 

42. The method of claim 30, wherein more than 29 contiguous CAG or 
25 CAA repeats in at least one allele of the AIB1 gene of the person and more than 23 

contiguous GGN repeats in the androgen receptor gene of the person indicates an 
increased risk of prostate cancer in the subject. 

43. The method of claim 30, wherein more than 29 contiguous CAG or 
CAA repeats in at least one allele of the AIB1 gene of the person and less than 23 

30 contiguous GGN repeats in the androgen receptor gene of the person indicates an 
increased risk of prostate cancer in the subject. 

44. The method of claim 30, wherein less than 29 contiguous CAG or CAA 
repeats in at least one allele of the ADB1 gene of the person and more than 23 
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5 contiguous GGN repeats in the androgen receptor gene of the person indicates an 
increased risk of prostate cancer in the subject. 

45. The method of claim 30, wherein less than 29 contiguous CAG or CAA 
repeats in at least one allele of the AIB1 gene of the person and less than 23 
contiguous GGN repeats in the androgen receptor gene of the person indicates an 

10 increased risk of prostate cancer in the subject 

46. The method of claim 30, wherein more or less than 29 contiguous CAG 
or CAA repeats in both alleles of the AIB1 gene of the person and more than 23 
contiguous GGN repeats in the androgen receptor gene of the person indicates an 
increased risk of prostate cancer in the subject. 

15 47. The method of claim 30, wherein more or less than 29 contiguous CAG 

or CAA repeats in both alleles of the AIB1 gene of the person and less than 23 
contiguous GGN repeats in the androgen receptor gene of the person indicates an 
increased risk of prostate cancer in the subject. 

48. The method of claim 30, wherein more than 29 contiguous CAG or 
20 CAA repeats in both alleles of the AIB1 gene of the person and more than 23 

contiguous GGN repeats in the androgen receptor gene of the person indicates an 
increased risk of prostate cancer in the subject. 

49. The method of claim 30, wherein less than 29 contiguous CAG or CAA 
repeats in both alleles of the ADB1 gene of the person and less than 23 contiguous 

25 GGN repeats in the androgen receptor gene of the person indicates an increased risk 
of prostate cancer in the subject. 

50. A method for assessing the risk of prostate cancer in a human subject 
comprising determining the length of the contiguous CAG or CAA repeats in an 
ABB1 gene of the subject and assessing whether the length of the CAG repeats is less 

30 than, equal to, or greater than 29 repeats, a length less than or greater than 29 repeats 
indicating an increased risk of prostate cancer in the subject. 

51. A method for assessing the risk of prostate cancer in a human subject 
comprising determining the length of the contiguous CAG or CAA repeats in an 
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5 A1B1 gene of the subject and assessing whether the length of the CAG or CAA 
repeats is less than, equal to, or greater than 29 repeats, and determining the length 
of the contiguous CAG or CAA repeats in the androgen receptor gene of the subject 
and assessing whether the length of the CAG or CAA repeats is less than, equal to, 
or greater than 23 repeats, a length less than or greater than 29 repeats in the AIB 1 
10 allele and less than or greater than 23 repeats in the androgen receptor gene 
indicating an increased risk of prostate cancer in the subject. 

52. A method for assessing the risk of prostate cancer in a human subject 
comprising determining the length of the contiguous CAG or CAA repeats in an 
AIB1 gene of the subject and assessing whether the length of the CAG or CAA 

15 repeats is less than, equal to, or greater than 29 repeats, and determining the length 
of the contiguous GGN repeats in the androgen receptor gene of the subject and 
assessing whether the length of the GGN repeats is less than, equal to, or greater 
than 23 repeats, a length less than or greater than 29 repeats in the ADB1 allele and 
less than or greater than 23 repeats in the androgen receptor gene indicating an 

20 increased risk of prostate cancer in the subject, wherein N is either T, G, or C. 

53. A kit for assessing a subject's risk for acquiring prostate cancer, comprising the 
oligonucleotides set forth in SEQ ID Nos: 1 and 2. 

54. A composition comprising the primers having the sequence set forth in SEQ ID 
Nos: 1 and 2. 
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SEQUENCE LISTING 

<110> Chang, Chawnshang 
Hsing, Ann 

<120> METHODS AND COMPOSITIONS FOR PREDICTING 
PROSTATE CANCER 

<130> 21108.000101 

<160> 21 

<170> FastSEQ for Windows Version 4.0 

<210> 1 
<211> 22 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : /note = 
synthetic construct 

<221> misc__feature 
<222> (0) . . * (0) 

<400> 1 

tcatcacctc cgacaacaga gg 22 

<210> 2 
<211> 22 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: /note = 
synthetic construct 

<400> 2 

tatggaaact gttgcggagg ag 22 

<210> 3 
<211> 22 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: /note » 
synthetic construct 

<400> 3 

cctctgttgt cggaggtgat ga 22 

<210> 4 
<211> 22 
<212> DNA 

<213> Artificial Sequence 



<220> 
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<223> Description of Artificial Sequence : /note = 
synthetic construct 



<400> 4 

ctcctccgca acagtttcca ta 



22 



<210> 5 
<211> 48 
<212> DNA 

<213> Artificial Sequence 
<220> 

<221> mis cofeature 
<222> l f 2, 3, 7, 8, 9 

<223> Sequence can be repeated one or more times 

<223> Description of Artificial Sequence : /note = 
synthetic construct 

<400> 5 

cagcaacagc aacagcaaca gcaacagcaa cagcagcaac agcagcaa 48 

<210> 6 
<211> 87 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : /note — 
synthetic construct 



<210> 7 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : /note = 
synthetic construct 

<400> 7 

gctctgggac gcaacctctc t 21 

<210> 8 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: /note = 
synthetic construct 



<400> 6 

cagcagcagc agcagcagca acagcagcag cagcagcagc agcagcagca acagcaacag 
caacagcaac agcagcaaca gcagcaa 



60 
87 



<400> 8 

gcagcgacta ccgcatcatc a 



21 



<210> 9 
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<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: /note 
synthetic construct 

<400> 9 

agagaggttg cgtcccagag c 

<210> 10 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : /note 
synthetic construct 

<400> 10 

tgatgatgcg gtagtcgctg c 

<210> 11 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : /note 
synthetic construct 

<400> 11 

accctcagcc gccgcttcct catc 

<210> 12 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : /note 
synthetic construct 

<400> 12 

ctgggatagg gcactctgct caac 

<210> 13 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : /note 
synthetic construct 

<400> 13 

gatgaggaag cggcggctga gggt 



<210> 14 
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<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : /note = 
synthetic construct 

<400> 14 

gttgagcaga gtgccctatc ccag 

<210> 15 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : /note = 
synthetic construct 

<400> 15 

cggggtaagg gaagtaggtg gaag 

<210> 16 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: /note = 
synthetic construct 

<400> 16 

ctctacgatg ggcttgggga gaac 

<210> 17 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<221> misc_feature 
<222> 21 

<223> Sequence can be repeated one or more times 

<223> Description of Artificial Sequence : /note = 
synthetic construct 

<400> 17 

ggtggtggtg ggggtggtgg c 

<210> 18 
<211> 6832 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence; /note =* 
synthetic construct 



WO 02/10452 



PCT/US01/23834 



5 



<400> 18 

cggcagcggc tgcggcttag tcggtggcgg ccggcggcgg ctgcgggctg agcggcgagt 60 

ttccgattta aagctgagct gcgaggaaaa tggcggcggg aggatcaaaa tacttgctgg 120 

atggtggact cagagaccaa taaaaataaa ctgcttgaac atcctttgac tggttagcca 180 

gttgctgatg tatattcaag atgagtggat taggagaaaa cttggatcca ctggccagtg 240 

attcacgaaa acgcaaattg ccatgtgata ctccaggaca aggtcttacc tgcagtggtg 300 

aaaaacggag acgggagcag gaaagtaaat atattgaaga attggctgag ctgatatctg 360 

ccaatcttag tgatattgac aatttcaatg tcaaaccaga taaatgtgcg attttaaagg 420 

aaacagtaag acagatacgt caaataaaag agcaaggaaa aactatttcc aatgatgatg 480 

atgttcaaaa agccgatgta tcttctacag ggcagggagt tattgataaa gactccttag 540 

gaccgctttt acttcaggca ttggatggtt tcctatttgt ggtgaatcga gacggaaaca 600 

ttgtatttgt atcagaaaat gtcacacaat acctgcaata taagcaagag gacctggtta 660 

acacaagtgt ttacaatatc ttacatgaag aagacagaaa ggattttctt aagaatttac 720 

caaaatctac agttaatgga gtttcctgga caaatgagac ccaaagacaa aaaagccata 780 

catttaattg ccgtatgttg atgaaaacac cacatgatat tctggaagac ataaacgcca 840 

gtcctgaaat gcgccagaga tatgaaacaa tgcagtgctt tgccctgtct cagccacgag 900 

ctatgatgga ggaaggggaa gatttgcaat cttgtatgat ctgtgtggca cgccgcatta 960 

ctacaggaga aagaacattt ccatcaaacc ctgagagctt tattaccaga catgatcttt 1020 

caggaaaggt tgtcaatata gatacaaatt cactgagatc ctccatgagg cctggctttg 1080 

aagatataat ccgaaggtgt attcagagat tttttagtct aaatgatggg cagtcatggt 1140 

cccagaaacg tcactatcaa gaagcttatc ttaatggcca tgcagaaacc ccagtatatc 1200 

gattctcgtt ggctgatgga actatagtga ctgcacagac aaaaagcaaa ctcttccgaa 1260 

atcctgtaac aaatgatcga catggctttg tctcaaccca cttccttcag agagaacaga 1320 

atggatatag accaaaccca aatcctgttg gacaagggat tagaccacct atggctggat 1380 

gcaacagttc ggtaggcggc atgagtatgt cgccaaacca aggcttacag atgccgagca 1440 

gcagggccta tggcttggca gaccctagca ccacagggca gatgagtgga gctaggtatg 1500 

ggggttccag taacatagct tcattgaccc ctgggccagg catgcaatca ccatcttcct 1560 

accagaacaa caactatggg ctcaacatga gtagcccccc acatgggagt cctggtcttg 1620 

ccccaaacca gcagaatatc atgatttctc ctcgtaatcg tgggagtcca aagatagcct 1680 

cacatcagtt ttctcctgtt gcaggtgtgc actctcccat ggcatcttct ggcaatactg 1740 

ggaaccacag cttttccagc agctctctca gtgccctgca agccatcagt gaaggtgtgg 1800 

ggacttccct tttatctact ctgtcatcac caggccccaa attggataac tctcccaata 18 60 

tgaatattac ccaaccaagt aaagtaagca atcaggattc caagagtcct ctgggctttt 1920 

attgcgacca aaatccagtg gagagttcaa tgtgtcagtc aaatagcaga gatcacctca 1980 

gtgacaaaga aagtaaggag agcagtgttg agggggcaga gaatcaaagg ggtcctttgg 2040 

aaagcaaagg tcataaaaaa ttactgcagt tacttacctg ttcttctgat gaccggggtc 2100 

attcctcctt gaccaactcc cccctagatt caagttgtaa agaatcttct gttagtgtca 2160 

ccagcccctc tggagtctcc tcctctacat ctggaggagt atcctctaca tccaatatgc 2220 

atgggtcact gttacaagag aagcaccgga ttttgcacaa gttgctgcag aatgggaatt 2280 

caccagctga ggtagccaag attactgcag aagccactgg gaaagacacc agcagtataa 2340 

cttcttgtgg ggacggaaat gttgtcaagc aggagcagct aagtcctaag aagaaggaga 2400 

ataatgcact tcttagatac ctgctggaca gggatgatcc tagtgatgca ctctctaaag 24 60 

aactacagcc ccaagtggaa ggagtggata ataaaatgag tcagtgcacc agctccacca 2520 

ttcctagctc aagtcaagag aaagacccta aaattaagac agagacaagt gaagagggat 2580 

ctggagactt ggataatcta gatgctattc ttggtgatct gactagttct gacttttaca 2640 

ataattccat atcctcaaat ggtagtcatc tggggactaa gcaacaggtg tttcaaggaa 2700 

ctaattctct gggtttgaaa agttcacagt ctgtgcagtc tattcgtcct ccatataacc 2760 

gagcagtgtc tctggatagc cctgtttctg ttggctcaag tcctccagta aaaaatatca 2820 

gtgctttccc catgttacca aagcaaccca tgttgggtgg gaatccaaga atgatggata 2880 

gtcaggaaaa ttatggctca agtatgggtg ggccaaaccg aaatgtgact gtgactcaga 2940 

ctccttcctc aggagactgg ggcttaccaa actcaaaggc cggcagaatg gaacctatga 3000 

attcaaactc catgggaaga ccaggaggag attataatac ttctttaccc agacctgcac 3060 

tgggtggctc tattcccaca ttgcctcttc ggtctaatag cataccaggt gcgagaccag 3120 

tattgcaaca gcagcagcag atgcttcaaa tgaggcctgg tgaaatcccc atgggaatgg 3180 

gggctaatcc ctatggccaa gcagcagcat ctaaccaact gggttcctgg cccgatggca 3240 

tgttgtccat ggaacaagtt tctcatggca ctcaaaatag gcctcttctt aggaattccc 3300 

tggatgatct tgttgggcca ccttccaacc tggaaggcca gagtgacgaa agagcattat 3360 

tggaccagct gcacactctt ctcagcaaca cagatgccac aggcctggaa gaaattgaca 3420 

gagctttggg cattcctgaa cttgtcaatc agggacaggc attagagccc aaacaggatg 3480 

ctttccaagg ccaagaagca gcagtaatga tggatcagaa ggcaggatta tatggacaga 3540 
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catacccagc acaggggcct ccaatgcaag gaggctttca tcttcaggga caatcaccat 3600 

cttttaactc tatgatgaat cagatgaacc agcaaggcaa ttttcctctc caaggaatgc 3660 

acccacgagc caacatcatg agaccccgga caaacacccc caagcaactt agaatgcagc 3720 

ttcagcagag gctgcagggc cagcagtttt tgaatcagag ccgacaggca cttgaattga 3780 

aaatggaaaa ccctactgct ggtggtgctg cggtgatgag gcctatgatg cagccccagc 3840 

agggttttct taatgctcaa atggtcgccc aacgcagcag agagctgcta agtcatcact 3900 

tccgacaaca gagggtggct atgatgatgc agcagcagca gcagcagcaa cagcagcagc 3960 

agcagcagca gcagcagcaa cagcaacagc aacagcaaca gcagcaacag cagcaaaccc 4020 

aggccttcag cccacctcct aatgtgactg cttcccccag catggatggg cttttggcag 4080 

gacccacaat gccacaagct cctccgcaac agtttccata tcaaccaaat tatggaatgg 4140 

gacaacaacc agatccagcc tttggtcgag tgtctagtcc tcccaatgca atgatgtcgt 4200 

caagaatggg tccctcccag aatcccatga tgcaacaccc gcaggctgca tccatctatc 4260 

agtcctcaga aatgaagggc tggccatcag gaaatttggc caggaacagc tccttttccc 4320 

agcagcagtt tgcccaccag gggaatcctg cagtgtatag tatggtgcac atgaatggca 4380 

gcagtggtca catgggacag atgaacatga accccatgcc catgtctggc atgcctatgg 4440 

gtcctgatca gaatactgct gacatctctg caccaggacc tcttaaggaa accactgtac 4500 

aaatgacact gcactaggat tattgggaag gaatcattgt tccaggcatc catcttggaa 4560 

gaaaggacca gctttgagct ccatcaaggg tattttaagt gatgtcattt gagcaggact 4620 

ggattttaag ccgaagggca atatctacgt gtttttcccc cctccttctg ctgtgtatca 4680 

tggtgttcaa aacagaaatg ttttttggca ttccacctcc tagggatata attctggaga 4740 

catggagtgt tactgatcat aaaacttttg tgtcactttt ttctgccttg ctagccaaaa 4800 

tctcttaaat acacgtaggt gggccagaga acattggaag aatcaagaga gattagaata 4860 

tctggtttct ctagttgcag tattggacaa agagcatagt cccagccttc aggtgtagta 4920 

gttctgtgtt gaccctttgt ccagtggaat tggtgattct gaattgtcct ttactaatgg 4980 

tgttgagttg ctctgtccct attatttgcc ctaggctttc tcctaatgaa ggttttcatt 5040 

tgccattcat gtcctgtaat acttcacctc caggaactgt catggatgtc caaatggctt 5100 

tgcagaaagg aaatgagatg acagtattta atcgcagcag tagcaaactt ttcacatgct 5160 

aatgtgcagc tgagtgcact ttatttaaaa agaatggata aatgcaatat tcttgaggtc 5220 

ttgagggaat agtgaaacac attcctggtt tttgcctaca cttacgtgtt agacaagaac 5280 

tatgattttt ttttttaaag tactggtgtc accctttgcc tatatggtag agcaataatg 5340 

ctttttaaaa ataaacttct gaaaacccaa ggccaggtac tgcattctga atcagaatct 5400 

cgcagtgttt ctgtgaatag atttttttgt aaatatgacc tttaagatat tgtattatgt 5460 

aaaatatgta tatacctttt tttgtaggtc acaacaactc atttttacag agtttgtgaa 5520 

gctaaatatt taacattgtt gatttcagta agctgtgtgg tgaggctacc agtggaagag 5580 

acatcccttg acttttgtgg cctgggggag gggtagtgct ccacagcttt tccttcccca 5640 

ccccccagcc ttagatgcct cgctcttttc aatctcttaa tctaaatgct ttttaaagag 5700 

attatttgtt tagatgtagg cattttaatt ttttaaaaat tcctctacca gaactaagca 5760 

ctttgttaat ttggggggaa agaatagata tggggaaata aacttaaaaa aaaatcagga 5820 

atttaaaaaa acgagcaatt tgaagagaat cttttggatt ttaagcagtc cgaaataata 5880 

gcaattcatg ggctgtgtgt gtgtgtgtat gtgtgtgtgt gtgtgtgtat gtttaattat 5940 

gttacctttt catccccttt aggagcgttt tcagattttg gttgctaaga cctgaatccc 6000 

atattgagat ctcgagtaga atccttggtg tggtttctgg tgtctgctca gctgtcccct 6060 

cattctacta atgtgatgct ttcattatgt ccctgtggat tagaatagtg tcagttattt 6120 

cttaagtaac tcagtaccca gaacagccag ttttactgtg attcagagcc acagtctaac 6180 

tgagcacctt ttaaacccct ccctcttctg ccccctacca cttttctgct gttgcctctc 6240 

tttgacacct' gttttagtca gttgggagga agggaaaaat caagtttaat tccctttatc 6300 

tgggttaatt catttggttc aaatagttga cggaattggg tttctgaatg tctgtgaatt 6360 

tcagaggtct ctgctagcct tggtatcatt ttctagcaat aactgagagc cagttaattt 6420 

taagaatttc acacatttag ccaatctttc tagatgtctc tgaaggtaag atcatttaat 6480 

atctttgata tgcttacgag taagtgaatc ctgattattt ccagacccac caccagagtg 6540 

gatcttattt tcaaagcagt atagacaatt atgagtttgc cctctttccc ctaccaagtt 6600 

caaaatatat ctaagaaaga ttgtaaatcc gaaaacttcc attgtagtgg cctgtgcttt 6660 

tcagatagta tactctcctg tttggagaca gaggaagaac caggtcagtc tgtctctttt 6720 

tcagctcaat tgtatctgac ccttctttaa gttatgtgtg tggggagaaa tagaatggtg 6780 

ctcttatctt tcttgacttt aaaaaaatta ttaaaaacaa aaaaaaaata aa 6832 



<210> 19 
<211> 1438 
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<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: /note = 
synthetic construct 

<400> 19 

Met Ser Gly Leu Gly Glu Asn Leu Asp Pro Leu Ala Ser Asp Ser Arg 

15 10 15 

Lys Arg Lys Leu Pro Cys Asp Thr Pro Gly Gin Gly Leu Thr Cys Ser 

20 25 30 

Gly Glu Lys Arg Arg Arg Glu Gin Glu Ser Lys Tyr He Glu Glu Leu 

35 40 45 

Ala Glu Leu He Ser Ala Asn Leu Ser Asp He Asp Asn Phe Asn Val 

50 55 60 

Lys Pro Asp Lys Cys Ala He Leu Lys Glu Thr . Val Arg Gin He Arg 
65 70 75 80 

Gin He Lys Glu Gin Gly Lys Thr He Ser Asn Asp Asp Asp Val Gin 

85 90 95 

Lys Ala Asp Val Ser Ser Thr Gly Gin Gly Val He Asp Lys Asp Ser 

100 105 110 

Leu Gly Pro Leu Leu Leu Gin Ala Leu Asp Gly Phe Leu Phe Val Val 

115 120 125 

Asn Arg Asp Gly Asn He Val Phe Val Ser Glu Asn Val Thr Gin Tyr 

130 135 140 

Leu Gin Tyr Lys Gin Glu Asp Leu Val Asn Thr Ser Val Tyr Asn He 
145 150 155 160 

Leu His Glu Glu Asp Arg Lys Asp Phe Leu Lys Asn Leu Pro Lys Ser 

165 170 175 

Thr Val Asn Gly Val Ser Trp Thr Asn Glu Thr Gin Arg Gin Lys Ser 

180 185 190 

His Thr Phe Asn Cys Arg Met Leu Met Lys Thr Pro His Asp He Leu 

195 200 205 

Glu Asp He Asn Ala Ser Pro Glu Met Arg Gin Arg Tyr Glu Thr Met 

210 215 220 

Gin Cys Phe Ala Leu Ser Gin Pro Arg Ala Met Met Glu Glu Gly Glu 
225 230 235 240 

Asp Leu Gin Ser Cys Met He Cys Val Ala Arg Arg He Thr Thr Gly 

245 250 255 

Glu Arg Thr Phe Pro Ser Asn Pro Glu Ser Phe He Thr Arg His Asp 

260 265 270 

Leu Ser Gly Lys Val Val Asn He Asp Thr Asn Ser Leu Arg Ser Ser 

275 280 285 

Met Arg Pro Gly Phe Glu Asp He He Arg Arg Cys He Gin Arg Phe 

290 295 300 

Phe Ser Leu Asn Asp Gly Gin Ser Trp Ser Gin Lys Arg His Tyr Gin 
305 310 315 320 

Glu Ala Tyr Leu Asn Gly His Ala Glu Thr Pro Val Tyr Arg Phe Ser 

325 330 335 

Leu Ala Asp Gly Thr He Val Thr Ala Gin Thr Lys Ser Lys Leu Phe 

340 345 350 

Arg Asn Pro Val Thr Asn Asp Arg His Gly Phe Val Ser Thr His Phe 

355 360 365 

Leu Gin Arg Glu Gin Asn Gly Tyr Arg Pro Asn Pro Asn Pro Val Gly 

370 375 380 

Gin Gly He Arg Pro Pro Met Ala Gly Cys Asn Ser Ser Val Gly Gly 
385 390 395 400 

Met Ser Met Ser Pro Asn Gin Gly Leu Gin Met Pro Ser Ser Arg Ala 
405 410 415 
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Tvr 

x jr *• 


Gly 


Leu 


Ala 


Asp 


Pro 


Ser 


Thr 








420 










Tvr 

A Jr 


Gly 


Gly 


Ser 


Ser 


Asn 


He 


Ala 






435 










440 


Gin 


Sex 


Pro 


Ser 


Ser 


1 Y x 


Gin 


Asn 




4 5n 










4 55 




Car* 


Jri O 


Pro 










biy 


4 fi5 










470 






raei. 


xxe 


Ser 


Pro 


Arg 


Asn 


Arg 


biy 


















irne 


Ser 


Pro 


V ct_L 


7\ "1 -j 
ri-Lci 




V dX 


filS 




















bxy 


Asn 


nlS 


Ser 


irne 


Ser 


ber 






SI 5 












lie 


Ser 


bill 


bxy 


vax 


r*~i 

bxy 


inr 


ber 














jjj 




bxy 


Pro 


Lys 


Leu 


Asp 


Asn 


On Ti- 
ber 


Fro 


54 ^ 
o *i o 










5 5D 






Lys 


vax 


ber 


Asn 


bin 


Asp 


ber 


Lys 










0 00 








bin 


Asn 


fro 


vax 


bill 


Ser 


Ser 










OoU 










Leu 


Sex 


Asp 


Lys 


bXU 


Ser 


Lys 


bXU 






oyo 










bUU 


(Jin 


Arg 


bxy 


Pro 


Leu 


bXU 


Ser 


Lys 




DJ.U 










DID 




Leu 


inr 


Cys 


ber 


Ser 


Asp 


ASp 


Arg 


/TOR 

b*£o 










boU 






Pro 


Leu 


Asp 


Ser 


Ser 


Cys 


Lys 


bXU 










64 5 








Ser 


bxy 


vax 


ber 


Ser 


Ser 


mr 


Ser 








bbu 










Met 


rllS 


bxy 


ber 


Leu 


Leu 


bxn 


bXU 






b /o 










bou 


Leu 


bxn 


Asn 


bxy 


Asn 


C A V 

ber 


Pro 


ax a 




can 

byu 










byo 




Ala 


rpl_ „ 

inr 


bxy 


Lys 


Asp 


inr 


Ser 


Ser 


7n5 










71 fi 






Vdl 


vax 


Lys 


bin 


blU 


bin 


Leu 


Ser 








79 5 








Leu 


Leu 


Arg 


Tyr 


Leu 


Leu 


Asp 


Arg 








74f) 










T.ire 
J->Jf O 


OX LI 


T 

I16U 


\3 _L.I1 


Prn 
xrJL u 


13X11 


V dX 


ni 

oX Li. 






755 










7fifi 
/ ou 




1 I1X 


oer 


oer 


inr 


lie 


Pro 


Ser 


f 1 \J 










775 






Lys 


inr 




1I1X 


061 


bXU 


Pin 

bXU 


785 










790 






Asp 


Ala 


He 


Leu 


Gly 


Asp 


Leu 


Thr 










805 








lie 


Ser 


Ser 


Asn 


Gly 


Ser 


His 


Leu 








820 










Gly 


Thr 


Asn 


Ser 


Leu 


Gly 


Leu 


Lys 






835 










840 


Arg 


Pro 


Pro 


Tyr 


Asn 


Arg 


Ala 


Val 




850 










855 




Gly 


'Ser 


Ser 


Pro 


Pro 


Val 


Lys 


Asn 


865 










870 




Lys 


Gin 


Pro 


Met 


Leu 


Gly 


Gly 


Asn 



885 



Thr 


Glv Gin Met Spr 


Gly Ala Arg 


425 




430 






Ser 


Leu Thr Pro Glv 


Pro Gly Met 




445 








Asn 




Asn 


Met 


Ser 




4 fin 








Leu 


7\ 1 a Prn Bon /til r> 

r\J.d rio Asn bin 


Gin 


Asn 


He 




475 






480 


Ser 


rio IjyS lie rila 


Ser 


His 


Gin 




490 




495 




ucX 


irX L> ric L. /Via. ucX 


Ser Gly Asn 


505 
jUj 




510 






oex 


Qor T.Qii Qqv" 7\ 1 -j 

oer ±ieu oei Ala 


Leu 


Gin 


Ala 












Leu 


Xieu ber inr lieu 


Ser 


Ser 


Pro 












Asn 


Met Asn xxe inr 


Gin 


Pro 


Ser 




ODD 






560 


ber 


c ro lieu bxy irne 


Tyr Cys Asp 




570 




575 




Cys 


bxn ber Asn ber 


Arg Asp 


His 


so 5 

3 DO 




590 






0 /-„ -»~ 
ber 


oer vax bxu biy 


Ala 


Glu 


Asn 




bUO 








biy 


His Lys Lys Leu 


Leu 


Gin 


Leu 












biy 


nis ber ber jjeu 


Thr 


Asn 


Ser 


fi75 






640 


Ser 


ber vax ber vax 


Thr 


Ser 


Pro 




OOU 




655 




bxy 


bxy vax ber ber 


Thr 


Ser 


Asn 


DOD 




670 






Lys 


His Arg He Leu 


His 


Lys 


Leu 




OOJ 








bXU 


1 7*1 a Tire Tl« 

vax Axa Xiys xxe 


Thr 


Ala 


Glu 




7nn 








Tift 

xxe 


inr ber bys bxy 


Asp 


Gly Asn 




71 5 






720 


Pro 


iiys iiys Jjys bxu 


Asn 


Asn 


Ala 




7^n 




735 






AO^> CX.\J iJCl aoL) 


Ala 


Leu 


Ser 


74 5 




750 






Gly 


vax rtop rio 1 1 Xij^o 


Met 


Ser 


Gin 




7fi5 








OCi. 


jci oj.11 uxu uyo 


.Asp Pro Lys 




780 








Gly 


OCX UJ. j jTXO^/ Uc lj. 


Asp Asn Leu 




795 






800 


Ser 


Ser Asn Phe» Tvr 


Asn 


Asn 


Ser 




810 




815 




Gly 


Thr Lys Gin Gin 


Val 


Phe 


Gin 


825 




830 






Ser 


Ser Gin Ser Val 


Gin 


Ser 


He 




845 








Ser 


Leu Asp Ser Pro 


Val 


Ser 


Val 




860 








He 


Ser Ala Phe Pro 


Met 


Leu 


Pro 




875 






880 


Pro 


Arg Met Met Asp 


Ser 


Gin 


Glu 




890 




895 
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Asn Tyr Gly Ser Ser Met Gly Gly Pro Asn Arg Asn Val Thr Val Thr 

900 905 910 

Gin Thr Pro Ser Ser Gly Asp Trp Gly Leu Pro Asn Ser Lys Ala Gly 

915 920 925 

Arg Met Glu Pro Met Asn Ser Asn Ser Met Gly Arg Pro Gly Gly Asp 

930 935 940 

Tyr Asn Thr Ser Leu Pro Arg Pro Ala Leu Gly Gly Ser lie Pro Thr 
945 950 955 960 

Leu Pro Leu Arg Ser Asn Ser lie Pro Gly Ala Arg Pro Val Leu Gin 

965 970 975 

Gin Gin Gin Gin Met Leu Gin Met Arg Pro Gly Glu lie Pro Met Gly 

980 985 990 

Met Gly Ala Asn Pro Tyr Gly Gin Ala Ala Ala Ser Asn Gin Leu Gly 

995 1000 1005 

Ser Trp Pro Asp Gly Met Leu Ser Met Glu Gin Val Ser His Gly Thr 

1010 1015 1020 

Gin Asn Arg Pro Leu Leu Arg Asn Ser Leu Asp Asp Leu Val Gly Pro 
1025 1030 1035 1040 

Pro Ser Asn Leu Glu Gly Gin Ser Asp Glu Arg Ala Leu Leu Asp Gin 

1045 1050 1055 

Leu His Thr Leu Leu Ser Asn Thr Asp Ala Thr Gly Leu Glu Glu lie 

1060 1065 1070 

Asp Arg Ala Leu Gly lie Pro Glu Leu Val Asn Gin Gly Gin Ala Leu 

1075 1080 1085 

Glu Pro Lys Gin Asp Ala Phe Gin Gly Gin Glu Ala Ala Val Met Met 

1090 1095 1100 

Asp Gin Lys Ala Gly Leu Tyr Gly Gin Thr Tyr Pro Ala Gin Gly Pro 
1105 1110 1115 1120 

Pro Met Gin Gly Gly Phe His Leu Gin Gly Gin Ser Pro Ser Phe Asn 

1125 1130 1135 

Ser Met Met Asn Gin Met Asn Gin Gin Gly Asn Phe Pro Leu Gin Gly 

1140 1145 1150 

Met His Pro Arg Ala Asn He Met Arg Pro Arg Thr Asn Thr Pro Lys 

1155 1160 1165 

Gin Leu Arg Met Gin Leu Gin Gin Arg Leu Gin Gly Gin Gin Phe Leu 

1170 1175 1180 

Asn Gin Ser Arg Gin Ala Leu Glu Leu Lys Met Glu Asn Pro Thr Ala 
1185 1190 1195 1200 

Gly Gly Ala Ala Val Met Arg Pro Met Met Gin Pro Gin Gin Gly Phe 

1205 1210 1215 

Leu Asn Ala Gin Met Val Ala Gin Arg Ser Arg Glu Leu Leu Ser His 

1220 1225 1230 

His Phe Arg Gin Gin Arg Val Ala Met Met Met Gin Gin Gin Gin Gin 

1235 1240 1245 

Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin 

1250 1255 1260 

Gin Gin Gin Gin Gin Gin Gin Gin Thr Gin Ala Phe Ser Pro Pro Pro 
1265 1270 1275 1280 

Asn Val Thr Ala Ser Pro Ser Met Asp Gly Leu Leu Ala Gly Pro Thr 

1285 1290 1295 

Met Pro Gin Ala Pro Pro Gin Gin Phe Pro Tyr Gin Pro Asn Tyr Gly 

1300 1305 1310 

Met Gly Gin Gin Pro Asp Pro Ala Phe Gly Arg Val Ser Ser Pro Pro 

1315 1320 1325 

Asn Ala Met Met Ser Ser Arg Met Gly Pro Ser Gin Asn Pro Met Met 

1330 1335 1340 

Gin His Pro Gin Ala Ala Ser He Tyr Gin Ser Ser Glu Met Lys Gly 
1345 1350 1355 1360 

Trp Pro Ser Gly Asn Leu Ala Arg Asn Ser Ser Phe Ser Gin Gin Gin 
1365 1370 1375 
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Phe Ala His Gin Gly Asn Pro Ala Val Tyr Ser Met Val His Met Asn 

1380 1385 1390 

Gly Ser Ser Gly His Met Gly Gin Met Asn Met Asn Pro Met Pro Met 

1395 1400 1405 

Ser Gly Met Pro Met Gly Pro Asp Gin Asn Thr Ala Asp lie Ser Ala 

1410 1415 1420 

Pro Gly Pro Leu Lys Glu Thr Thr Val Gin Met Thr Leu His 
1425 1430 1435 

<210> 20 
<211> 4321 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: /note = 
synthetic construct 

<400> 20 

cgagatcccg gggagccagc ttgctgggag agcgggacgg tccggagcaa gcccacaggc 60 

agaggaggcg acagagggaa aaagggccga gctagccgct ccagtgctgt acaggagccg 120 

aagggacgca ccacgccagc cccagcccgg ctccagcgac agccaacgcc tcttgcagcg 180 

cggcggcttc gaagccgccg cccggagctg ccctttcctc ttcggtgaag tttttaaaag 240 

ctgctaaaga ctcggaggaa gcaaggaaag tgcctggtag gactgacggc tgcctttgtc 300 

ctcctcctct ccaccccgcc tccccccacc ctgccttccc cccctccccc gtcttctctc 360 

ccgcagctgc ctcagtcggc tact ct cage caacccccct caccaccctt ctccccaccc 420 

gcccccccgc ccccgtcggc ccagcgctgc cagcccgagt ttgeagagag gtaactccct 480 

ttggctgcga gegggegage tagctgeaca ttgcaaagaa ggctcttagg agecaggega 540 

ctggggagcg gcttcagcac tgcagccacg acccgcctgg ttagaattcc ggeggagaga 600 

accctctgtt ttcccccact ctctctccac ctcctcctgc cttccccacc ccgagtgcgg 660 

agcagagatc aaaagatgaa aaggcagtca ggtcttcagt agecaaaaaa caaaacaaac 720 

aaaaacaaaa aagccgaaat aaaagaaaaa gataataact cagttcttat ttgcacctac 780 

ttcagtggac actgaatttg gaaggtggag gattttgttt ttttctttta agatctgggc 840 

atcttttgaa tctacccttc aagtattaag agacagactg tgagectage agggcagatc 900 

ttgtccaccg tgtgtcttct tetgeacgag actttgaggc tgtcagagcg etttttgegt 960 

ggttgctccc gcaagtttcc ttctctggag cttcccgcag gtgggcagct agetgeageg 1020 

actaccgcat catcacagcc tgttgaactc ttctgagcaa gagaagggga ggcggggtaa 1080 

gggaagtagg tggaagattc agccaagctc aaggatggaa gtgcagttag ggctgggaag 1140 

ggtctaccct cggccgccgt ccaagaccta ccgaggagct ttccagaatc tgttccagag 1200 

cgtgcgcgaa gtgatccaga acccgggccc caggcaccca gaggecgega gcgcagcacc 1260 

tcccggcgcc agtttgctgc tgetgeagea gcagcagcag cagcagcagc agcagcagca 1320 

gcagcagcag cagcagcagc agcagcaaga gactagcccc aggcagcagc agcagcagca 1380 

gggtgaggat ggttctcccc aagcccatcg tagaggcccc acaggctacc tggtcctgga 1440 

tgaggaacag caaccttcac agccgcagtc ggccctggag tgccaccccg agagaggttg 1500 

cgtcccagag cctggagccg ccgtggccgc cagcaagggg ctgccgcagc agctgccagc 1560 

acctccggac gaggatgact cagctgcccc atccacgttg tccctgctgg gccccacttt 1620 

ccccggctta ageagctget ccgctgacct taaagacatc ctgagegagg ccagcaccat 1680 

gcaactcctt cagcaacagc agcaggaagc agtatccgaa ggcagcagca gegggagage 1740 

gagggaggee tegggggetc ccacttcctc caaggacaat tacttagggg gcacttcgac 1800 

catttctgac' aacgecaagg agttgtgtaa ggcagtgtcg gtgtccatgg gcctgggtgt 1860 

ggaggcgttg gagcatctga gtccagggga acagcttegg ggggattgea tgtacgcccc 1920 

acttttggga gttccacccg ctgtgcgtcc cactccttgt gccccattgg ccgaatgcaa 1980 

aggttctctg ctagacgaca gcgcaggcaa gagcactgaa gatactgetg agtattcccc 2040 

tttcaaggga ggttacacca aagggctaga aggegagage etaggctget ctggcagcgc 2100 

tgcagcaggg agetceggga cacttgaact gccgtctacc ctgtctctct acaagtcegg 2160 

agcactggac gaggcagctg cgtaccagag tcgcgactac tacaactttc cactggctct 2220 

ggccggaccg ccgccccctc cgccgcctcc ccatccccac gctcgcatca agctggagaa 2280 

cccgctggac tacggcagcg cctgggcggc tgcggcggcg cagtgccgct atggggacct 2340 

ggcgagcctg catggegegg gtgcagcggg acccggttct gggtcaccct cagccgccgc 2400 

ttcctcatcc tggcacactc tcttcacagc cgaagaaggc cagttgtatg gaccgtgtgg 2460 
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tggtggtggg ggtggtggcg gcggcggcgg cggcggcggc ggcggcggcg gcggcggcgg 2520 

cggcggcggc gaggcgggag ctgtagcccc ctacggctac actcggcccc ctcaggggct 2580 

ggcgggccag gaaagcgact tcaccgcacc tgatgtgtgg taccctggcg gcatggtgag 2640 

cagagtgccc tatcccagtc ccacttgtgt caaaagcgaa atgggcccct ggatggatag 2700 

ctactccgga ccttacgggg acatgcgttt ggagactgcc agggaccatg ttttgcccat 2760 

tgactattac. tttccacccc agaagacctg cctgatctgt ggagatgaag cttctgggtg 2820 

tcactatgga gctctcacat gtggaagctg caaggtcttc ttcaaaagag ccgctgaagg 2880 

gaaacagaag tacctgtgcg ccagcagaaa tgattgcact attgataaat tccgaaggaa 2940 

aaattgtcca tcttgtcgtc ttcggaaatg ttatgaagca gggatgactc .tgggagcccg 3000 

gaagctgaag aaacttggta atctgaaact acaggaggaa ggagaggctt ccagcaccac 3060 

cagccccact gaggagacaa cccagaagct gacagtgtca cacattgaag gctatgaatg 3120 

tcagcccatc tttctgaatg tcctggaagc cattgagcca ggtgtagtgt gtgctggaca 3180 

cgacaacaac cagcccgact cctttgcagc cttgctctct agcctcaatg aactgggaga 3240 

gagacagctt gtacacgtgg tcaagtgggc caaggccttg cctggcttcc gcaacttaca 3300 

cgtggacgac cagatggctg tcattcagta ctcctggatg gggctcatgg tgtttgccat 3360 

gggctggcga tcctt caeca atgtcaactc caggatgetc tacttcgccc ctgatctggt 3420 

tttcaatgag taccgcatgc acaagtcccg gatgtacagc cagtgtgtcc gaatgaggca 3480 

cctctctcaa gagtttggat ggctccaaat caccccccag gaattcctgt gcatgaaagc 3540 

actgctactc ttcagcatta ttccagtgga tgggctgaaa aatcaaaaat tctttgatga 3600 

acttcgaatg aactacatca aggaactcga tegtatcatt geatgeaaaa gaaaaaatcc 3660 

cacatcctgc teaagacget tctaccagct caccaagctc ctggactccg tgeagectat 3720 

tgegagagag ctgcatcagt tcacttttga cctgctaatc aagtcacaca tggtgagcgt 3780 

ggactttccg gaaatgatgg cagagatcat ctctgtgcaa gtgeccaaga tcctttctgg 3840 

gaaagtcaag cccatctatt tccacaccca gtgaagcatt ggaaacccta tttccccacc 3900 

ccagctcatg ccccctttca gatgtcttct gcctgttata actctgcact actcctctgc 3960 

agtgccttgg ggaatttcct ctattgatgt acagtctgtc atgaacatgt tcctgaattc 4020 

tatttgetgg gctttttttt tctctttctc tcctttcttt ttcttcttcc ctccctatct 4080 

aaccctccca tggcaccttc agactttget tcccattgtg gctcctatct gtgttttgaa 4140 

tggtgttgta tgcctttaaa tctgtgatga tcctcatatg gcccagtgtc aagttgtgct 4200 

tgtttacagc actactctgt gccagccaca caaaegttta cttatcttat gecaegggaa 4260 

gtttagagag ctaagattat ctggggaaat caaaacaaaa aacaagcaaa caaaaaaaaa 4320 

a 4321 



<210> 21 
<211> 919 
<212> PRT 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence : /note = 
synthetic construct 



<400> 21 



Met 


Glu 


Val 


Gin 


Leu 


Gly 


Leu 


Gly 


1 








5 






Lys 


Thr 


Tyr 


Arg 
20 


Gly 


Ala 


Phe 


Gin 


Val 


He 


Gin 
35 


Asn 


Pro 


Gly 


Pro 


Arg 
40 


Pro 


Pro 
50 


Gly 


Ala 


Ser 


Leu 


Leu 
55 


Leu 


Gin 


Gin 


Gin 


Gin 


Gin 


Gin 


Gin 


Gin 


65 










70 






Ser 
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